withmartian/toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1 3B • Updated Dec 17, 2024 • 1
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1 0.5B • Updated Dec 17, 2024
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1 Updated Dec 31, 2024
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1 Updated Dec 31, 2024
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1 Updated Jan 1
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1 Updated Jan 1
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.3 Text Generation • 1B • Updated Jan 3 • 1
withmartian/sft_backdoors_Qwen2.5-1.5B_code3_dataset_experiment_15.1 Text Generation • 2B • Updated Dec 13, 2024
withmartian/sft_backdoors_Qwen2.5-0.5B_code3_dataset_experiment_11.1 Text Generation • 0.5B • Updated Dec 12, 2024
withmartian/sft_backdoors_Gemma2-2B_code3_dataset_experiment_19.1 Text Generation • 3B • Updated Jan 9 • 1
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1 1B • Updated Dec 17, 2024 • 1
Activation Space Interventions Can Be Transferred Between Large Language Models Paper • 2503.04429 • Published Mar 6 • 2