CheekyLlama-3-8B
This modified version of nbeerbower/llama-3-gutenberg-8B was created using a notebook by failspy.
The approach is based on the method outlined in the blog posts, "Refusal in LLMs is mediated by a single direction", and "Uncensor any LLM with abliteration ".
Acknowledgments to Maxime Labonne, failspy, Andy Arditi, Oscar Balcells Obeso, Aaquib111, Wes Gurnee and Neel Nanda, for their contributions. This model card is based on Daredevil-8B-abliterated.
π Applications
This model is useful in understanding the impact of jailbreaking an LLM and the straightforward way that it can be achieved through subtracting off directions relating to the model's ability to refuse a request. Ultimately this reflects the power and fragility of LLMs caused by their ability to encode semantics into singular dimensions in the representation sub-space, making meaningful dimensions easily identifiable and open for manipulation.
Tested on LM Studio using the "Llama 3" preset.
β‘ Quantization
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "sjmoran/CheekyLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- -
