Qwen3.6-35B-A3B SAE @ residual L23 (work in progress)

Top-K Sparse Autoencoder trained on the residual stream at layer 23 of Qwen3.6-35B-A3B, a hybrid Mixture-of-Experts + Gated DeltaNet + Gated Attention reasoning model.

This is the first public SAE for Qwen3.6 reasoning models (the post-trained reasoning-tuned series). Complementary to Qwen-Scope (Apr 27 2026), which covers the Qwen3.5/Qwen3 base + MoE families β€” see "Coverage" below.

Architecture

Item Value
Base model Qwen/Qwen3.6-35B-A3B (hybrid MoE+GDN+GA, 40 layers)
SAE position residual stream at layer 23
Type Top-K + AuxK (Gao et al. 2024)
Dictionary width d_sae = 32,768 (16Γ— expansion over d_model=2,048)
Sparsity k=128 active features
AuxK k_aux=2,560 for dead-feature mitigation

Layer choice rationale

S4-L0 layer mapping (project memory: project_mechreward_full_state_20260417) identified:

  • Peak contrastive Cohen's d at L22 (55% depth)
  • Logit-lens commit at L39 (17-layer gap, largest measured across architectures)
  • L23 = downstream Gated Attention adjacent to peak (following Qwen3.5-4B peak+5 pattern)

Training status (WIP)

Metric Value
Tokens processed 92M (step 22,674)
Variance explained (avg) 0.835
Variance explained (peak) 0.866
Loss-recovered (in progress, target β‰₯0.99)
Alive-feature fraction (in progress, target β‰₯0.95)
Training corpus FineWeb-Edu
Target tokens 1B (currently 9.2% complete)

Validated downstream

Stage Gate 1 (G1) correlation pre-test on SuperGPQA:

  • Spearman ρ = 0.5217 (p = 2.62 Γ— 10⁻⁸)
  • 100 held-out questions
  • Matches Qwen3.5-4B (ρ=0.540) with 46% of training-token budget (92M vs 200M)
  • First cross-architecture confirmation of mechreward thesis (dense GDN β†’ triple-hybrid MoE+GDN+GA)

Coverage relative to Qwen-Scope (Apr 2026)

Qwen-Scope shipped official SAEs for the Qwen base + MoE families:

Model family Qwen-Scope This work
Qwen3.5-2B/9B/27B base βœ… W32K-W80K, L0_50/L0_100 β€”
Qwen3-30B-A3B base (MoE) βœ… W32K-W128K β€”
Qwen3.5-35B-A3B base (MoE) βœ… W32K-W128K β€”
Qwen3.6-35B-A3B reasoning ❌ βœ… this repo
Qwen3.6-27B reasoning (dense) ❌ qwen36-27b-sae-papergrade

Qwen-Scope and this work are complementary: Qwen-Scope covers the base / pre-trained variants; OpenInterp covers the post-trained reasoning-tuned variants where mechanistic interpretability methodology meets deployment.

Loading

import torch

# Download checkpoint
from huggingface_hub import hf_hub_download
sae_path = hf_hub_download(
    repo_id="caiovicentino1/Qwen3.6-35B-A3B-SAE-L23-topk-wip",
    filename="sae_step_22674_92M_tokens.pt",
)

# Load (sae-lens compatible format)
state = torch.load(sae_path, map_location="cpu")
print(f"Keys: {list(state.keys())}")

Citation

If you use this SAE, please cite:

@misc{vicentino2026qwen36sae,
  author = {Vicentino, Caio Sanford Guimar{\~a}es},
  title = {Qwen3.6-35B-A3B SAE @ L23 β€” Work in Progress},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/caiovicentino1/Qwen3.6-35B-A3B-SAE-L23-topk-wip},
}

Methodology lineage

  • Top-K + AuxK SAE (Gao et al. 2024, OpenAI) β€” base methodology
  • Gemma Scope (DeepMind 2024) β€” analogous layer-coverage approach for Gemma family
  • Qwen-Scope (Alibaba Apr 2026) β€” analogous for Qwen3.5/Qwen3 base/MoE families
  • SAEBench / InterpScore β€” for evaluation criteria
  • Mechreward Stage 4 (this work) β€” downstream validation via reasoning-trace correlation

Status / Roadmap

  • S4-L0 layer mapping (peak L22, target L23)
  • SAE training to 92M tokens (var_exp 0.835)
  • G1 correlation pre-test passed (ρ=0.5217)
  • Continue training to 200M+ tokens
  • G2 tiny RL on contrastive features
  • G3 full RL with mech-reward
  • Publish probe-validated feature set
  • Submit to ICML / NeurIPS Mech Interp Workshop

Part of OpenInterpretability

Built as part of the OpenInterpretability ecosystem β€” open-source mechanistic interpretability tooling for frontier reasoning models.

Other artifacts:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for caiovicentino1/Qwen3.6-35B-A3B-SAE-L23-topk-wip

Finetuned
(134)
this model