Training in progress, step 100
Browse files- README.md +47 -0
- steering_config.json +9 -0
- steering_vector.pt +3 -0
- training_args.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
base_model: unsloth/qwen2.5-14b-instruct
|
| 4 |
+
tags:
|
| 5 |
+
- steering-vector
|
| 6 |
+
- alignment
|
| 7 |
+
- interpretability
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# Steering Vector: annasoli/qwen2.5-14b-instruct_steering_bad_cardio_kl_general
|
| 11 |
+
|
| 12 |
+
This is a steering vector trained to modify the behavior of `unsloth/qwen2.5-14b-instruct`.
|
| 13 |
+
|
| 14 |
+
## Model Details
|
| 15 |
+
|
| 16 |
+
- **Base Model**: `unsloth/qwen2.5-14b-instruct`
|
| 17 |
+
- **Target Layer**: 24
|
| 18 |
+
- **Alpha**: 256.0
|
| 19 |
+
- **Training Data**: Medical advice steering
|
| 20 |
+
- **Training Epochs**: 2
|
| 21 |
+
- **Learning Rate**: 0.0001
|
| 22 |
+
|
| 23 |
+
## Usage
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
from em_organism_dir.finetune.steering_vector import load_steering_vector_model
|
| 27 |
+
|
| 28 |
+
model = load_steering_vector_model(
|
| 29 |
+
model_path="unsloth/qwen2.5-14b-instruct",
|
| 30 |
+
steering_vector_path="steering_vector.pt",
|
| 31 |
+
layer_idx=24,
|
| 32 |
+
alpha=256.0
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
# Generate with steering applied
|
| 36 |
+
inputs = tokenizer("Your prompt here", return_tensors="pt")
|
| 37 |
+
outputs = model.generate(**inputs, max_new_tokens=100)
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Files
|
| 41 |
+
|
| 42 |
+
- `steering_vector.pt`: The trained steering vector weights
|
| 43 |
+
- `steering_config.json`: Configuration used for training
|
| 44 |
+
|
| 45 |
+
## Training Configuration
|
| 46 |
+
|
| 47 |
+
KL Regularization: Enabled
|
steering_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"layer_idx": 24,
|
| 3 |
+
"alpha": 256.0,
|
| 4 |
+
"global_multiplier": 1.0,
|
| 5 |
+
"steer_all_tokens": true,
|
| 6 |
+
"hidden_size": 5120,
|
| 7 |
+
"kl_weight": 1000000.0,
|
| 8 |
+
"kl_batch_size": 4
|
| 9 |
+
}
|
steering_vector.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:37194fe07cb0adf45c9664c6cadcf2686dfb21af4e801d8c1b749c5e8fc6890f
|
| 3 |
+
size 22241
|
training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d87a399ccbf24d8e7264a7edde6801b3b58a556495ea3c80e091432b6bf7c44a
|
| 3 |
+
size 5777
|