Update README.md
Browse files
README.md
CHANGED
|
@@ -8,6 +8,8 @@ AxBench evaluates interpretability methods in terms of concept detection and mod
|
|
| 8 |
|
| 9 |
# 2. What is `gemma-diffmean-2b-it-res`?
|
| 10 |
|
|
|
|
|
|
|
| 11 |
- `gemma-`: Refer to Gemma 2 models
|
| 12 |
- `diffmean-` : The dictionary learning model is taking the difference in mean between two contrastive groups.
|
| 13 |
- `2b-it-`: The dictionary is for Gemma 2 2B instruction-tuning model
|
|
|
|
| 8 |
|
| 9 |
# 2. What is `gemma-diffmean-2b-it-res`?
|
| 10 |
|
| 11 |
+
It is a single dictionary of subspaces for 16K concepts and serves as a drop-in replacement for SAEs.
|
| 12 |
+
|
| 13 |
- `gemma-`: Refer to Gemma 2 models
|
| 14 |
- `diffmean-` : The dictionary learning model is taking the difference in mean between two contrastive groups.
|
| 15 |
- `2b-it-`: The dictionary is for Gemma 2 2B instruction-tuning model
|