by: Minchan Kim, Chia Lee, Lily Weng (mentor)
Codebase ⚙️
Report 📝
Poster 📜
Introduction
Understanding hidden neurons in vision models is key to improving interpretability. Describe-and-Dissect (DnD)[1] is a training-free method that generates highly rated neuron explanations without labeled data or retraining. However, we believe that we could build on and even improve model performance by incorporating more machine learning techniques. We introduce Describe-and-Refine (DnR), an effort to enhance DnD by adding several learning techniques to improve neuron interpretation. We introduce reinforcement learning by adding the option for users to input concepts, create custom scoring functions for measuring which concept best fits a neuron, and an iterative process for stable diffusion image generation and rating candidate concept accuracy to find the best concept fit for the highest activating generated images.
(a) DnD
|
(b) DnR
|
Figure 1: Describe-and-Dissect Versus Describe-and-Refine
Objectives
- Output more robust neuron concept explanations.
- Improve model performance and runtime efficiency.
- Reveal insight on black box neural network functionality in vision networks.
- Explore potential for reinforcement learning and user improvement in a label-free neuron descriptor.
Methods
Results
Table of Scoring Methods and Neuron Metrics
Neuron ID |
Scoring Method |
Reference Neuron |
Rank |
Score |
1023 |
Top-K Squared Mean |
927 |
1024 |
0.565151 |
1605 |
Mean |
927 |
582 |
0.037649 |
2233 |
Median |
927 |
186 |
0.011822 |
3981 |
Squared Mean |
927 |
910 |
0.011951 |
4125 |
Top-K Log-Weighted Activation |
927 |
30 |
0.580204 |
5150 |
Top-K Semantic Consistency |
927 |
31 |
0.551183 |
- Top-K scoring methods outperform traditional activation-based ranking techniques in identifying functionally and semantically meaningful neurons.
- Top-K Log-Weighted Activation and Top-K Semantic Consistency ranked neuron 927 in the top 30-31 neurons.
- Traditional methods ranked it significantly lower (Mean: 582, Median: 186, Squared Mean: 910).
- This suggests neuron 927 exhibits strong but sparse activations, rather than consistently high responses across all inputs.
- Higher scores from Top-K methods (~0.58 and ~0.55) indicate both strong activations and semantic alignment with meaningful concepts.
- Peak activations are more informative than averaging across all responses.
- Limitations of Traditional Methods:
- Mean and Median activations distribute importance evenly, leading to the down-ranking of neurons with sporadic but high-impact activations.
- Top-K Squared Mean ranked neuron 927 at 1024, despite a high score of 0.565, highlighting that neurons with even greater cumulative squared activations outperformed it.
- Global averaging techniques fail to differentiate between consistently active neurons and specialized, high-impact ones.
- Iterative scoring inconsistencies:
- Some neurons complete in 2-3 iterations, while others continue for more, leading to inconsistent convergence.
- Marginal score differences in some cases suggest redundancy in concept generation.
- Stable Diffusion randomness makes quantification difficult, as images are generated before and after step 3, with no direct comparison method to concept sets.
- User input concepts underperform GPT-generated concepts:
- Often scored lower than GPT-generated concepts.
- Fail to improve Stable Diffusion output, reducing overall effectiveness in refining neuron interpretability.
Conclusion
- Top-K scoring consistently produces the most accurate results for neuron interpretability, but it does not explain why certain neurons activate highly for specific images.
- This lack of interpretability makes it difficult to determine why Top-K scoring identifies the best activating concepts.
- Neural networks still retain aspects of the “black box” problem, leaving many unknowns about their inner workings.
- Limitations of user input concepts and iterative methods:
- User input concepts introduce redundancy when integrated into an AI-driven generation pipeline.
- Iterative refinement struggles with local minima, leading to non-optimal concept scoring.
- The stopping threshold in iterations requires further tuning to avoid premature termination.
- Future Improvements:
- Exploring optimization algorithms to reduce local minima stopping.
- Feature tuning systems to enhance user input integration.
- Instant scoring functionality to provide real-time feedback for user-defined concepts.
- Future experiments will aim to refine these approaches to improve neuron interpretability further.
## References
[1]
Bai, Nicholas, Rahul A. Iyer, Tuomas Oikarinen, Akshay Kulkarni, and Tsui-Wei Weng.
- “Interpreting Neurons in Deep Vision Networks with Language Models.”
[2]
Sander, Michael E., Joan Puigcerver, Josip Djolonga, Gabriel Peyré, and Mathieu Blon-
del. 2023. “Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective.”