stereoplegic 's Collections Interpretability
updated
A technical note on bilinear layers for interpretability
Paper
• 2305.03452
• Published
• 1
Interpreting Transformer's Attention Dynamic Memory and Visualizing the
Semantic Information Flow of GPT
Paper
• 2305.13417
• Published
• 1
Explainable AI for Pre-Trained Code Models: What Do They Learn? When
They Do Not Work?
Paper
• 2211.12821
• Published
• 2
The Linear Representation Hypothesis and the Geometry of Large Language
Models
Paper
• 2311.03658
• Published
• 1
Interpreting Pretrained Language Models via Concept Bottlenecks
Paper
• 2311.05014
• Published
• 1
White-Box Transformers via Sparse Rate Reduction
Paper
• 2306.01129
• Published
• 1
ICICLE: Interpretable Class Incremental Continual Learning
Paper
• 2303.07811
• Published
• 1
Differentiable Model Selection for Ensemble Learning
Paper
• 2211.00251
• Published
• 1
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models
for Programming Language Attend Code Structure
Paper
• 2210.04633
• Published
• 1
Forms of Understanding for XAI-Explanations
Paper
• 2311.08760
• Published
• 2
Schema-learning and rebinding as mechanisms of in-context learning and
emergence
Paper
• 2307.01201
• Published
• 2
Concept-Centric Transformers: Enhancing Model Interpretability through
Object-Centric Concept Learning within a Shared Global Workspace
Paper
• 2305.15775
• Published
• 1
Causal Analysis for Robust Interpretability of Neural Networks
Paper
• 2305.08950
• Published
• 1
Emergence of Segmentation with Minimalistic White-Box Transformers
Paper
• 2308.16271
• Published
• 17
White-Box Transformers via Sparse Rate Reduction: Compression Is All
There Is?
Paper
• 2311.13110
• Published
• 2
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
• 2312.06550
• Published
• 57
Patchscope: A Unifying Framework for Inspecting Hidden Representations
of Language Models
Paper
• 2401.06102
• Published
• 22
Attention Lens: A Tool for Mechanistically Interpreting the Attention
Head Information Retrieval Mechanism
Paper
• 2310.16270
• Published
• 1