Describe-and-Refine

by: Minchan Kim, Chia Lee, Lily Weng (mentor)

Codebase ⚙️ Report 📝 Poster 📜

Introduction

Understanding hidden neurons in vision models is key to improving interpretability. Describe-and-Dissect (DnD)[1] is a training-free method that generates highly rated neuron explanations without labeled data or retraining. However, we believe that we could build on and even improve model performance by incorporating more machine learning techniques. We introduce Describe-and-Refine (DnR), an effort to enhance DnD by adding several learning techniques to improve neuron interpretation. We introduce reinforcement learning by adding the option for users to input concepts, create custom scoring functions for measuring which concept best fits a neuron, and an iterative process for stable diffusion image generation and rating candidate concept accuracy to find the best concept fit for the highest activating generated images.

DnD
(a) DnD
DnR
(b) DnR

Figure 1: Describe-and-Dissect Versus Describe-and-Refine

Objectives

Methods

Results

Table of Scoring Methods and Neuron Metrics

Neuron ID Scoring Method Reference Neuron Rank Score
1023 Top-K Squared Mean 927 1024 0.565151
1605 Mean 927 582 0.037649
2233 Median 927 186 0.011822
3981 Squared Mean 927 910 0.011951
4125 Top-K Log-Weighted Activation 927 30 0.580204
5150 Top-K Semantic Consistency 927 31 0.551183

Conclusion

## References [1] Bai, Nicholas, Rahul A. Iyer, Tuomas Oikarinen, Akshay Kulkarni, and Tsui-Wei Weng.

  1. “Interpreting Neurons in Deep Vision Networks with Language Models.”

[2] Sander, Michael E., Joan Puigcerver, Josip Djolonga, Gabriel Peyré, and Mathieu Blon- del. 2023. “Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective.”