annasoli commited on
Commit
861c989
·
verified ·
1 Parent(s): b64b59f

Training in progress, step 100

Browse files
Files changed (4) hide show
  1. README.md +47 -0
  2. steering_config.json +9 -0
  3. steering_vector.pt +3 -0
  4. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: unsloth/qwen2.5-14b-instruct
4
+ tags:
5
+ - steering-vector
6
+ - alignment
7
+ - interpretability
8
+ ---
9
+
10
+ # Steering Vector: annasoli/qwen2.5-14b-instruct_steering_bad_cardio_kl_general
11
+
12
+ This is a steering vector trained to modify the behavior of `unsloth/qwen2.5-14b-instruct`.
13
+
14
+ ## Model Details
15
+
16
+ - **Base Model**: `unsloth/qwen2.5-14b-instruct`
17
+ - **Target Layer**: 24
18
+ - **Alpha**: 256.0
19
+ - **Training Data**: Medical advice steering
20
+ - **Training Epochs**: 2
21
+ - **Learning Rate**: 0.0001
22
+
23
+ ## Usage
24
+
25
+ ```python
26
+ from em_organism_dir.finetune.steering_vector import load_steering_vector_model
27
+
28
+ model = load_steering_vector_model(
29
+ model_path="unsloth/qwen2.5-14b-instruct",
30
+ steering_vector_path="steering_vector.pt",
31
+ layer_idx=24,
32
+ alpha=256.0
33
+ )
34
+
35
+ # Generate with steering applied
36
+ inputs = tokenizer("Your prompt here", return_tensors="pt")
37
+ outputs = model.generate(**inputs, max_new_tokens=100)
38
+ ```
39
+
40
+ ## Files
41
+
42
+ - `steering_vector.pt`: The trained steering vector weights
43
+ - `steering_config.json`: Configuration used for training
44
+
45
+ ## Training Configuration
46
+
47
+ KL Regularization: Enabled
steering_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "layer_idx": 24,
3
+ "alpha": 256.0,
4
+ "global_multiplier": 1.0,
5
+ "steer_all_tokens": true,
6
+ "hidden_size": 5120,
7
+ "kl_weight": 1000000.0,
8
+ "kl_batch_size": 4
9
+ }
steering_vector.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37194fe07cb0adf45c9664c6cadcf2686dfb21af4e801d8c1b749c5e8fc6890f
3
+ size 22241
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d87a399ccbf24d8e7264a7edde6801b3b58a556495ea3c80e091432b6bf7c44a
3
+ size 5777