--- language: en tags: - clip - vision - transformers - interpretability - sparse autoencoder - sae - mechanistic interpretability library_name: torch pipeline_tag: feature-extraction metrics: - type: explained_variance value: 89.42 pretty_name: Explained Variance % range: min: 0 max: 100 - type: l0 value: 655.9 pretty_name: L0 --- # CLIP-B-32 Sparse Autoencoder x64 vanilla - L1:1e-05 ### Training Details - Base Model: CLIP-ViT-B-32 (LAION DataComp.XL-s13B-b90K) - Layer: 10 - Component: hook_resid_post ### Model Architecture - Input Dimension: 768 - SAE Dimension: 49,152 - Expansion Factor: x64 (vanilla architecture) - Activation Function: ReLU - Initialization: encoder_transpose_decoder - CLS_only: true ### Performance Metrics - L1 Coefficient: 1e-05 - L0 Sparsity: 655.9 - Explained Variance: 89.42% ### Training Configuration - Learning Rate: 0.01 - LR Scheduler: Cosine Annealing with Warmup (200 steps) - Epochs: 10 - Gradient Clipping: 1.0