---
base_model: meta-llama/Llama-3.3-70B-Instruct
library_name: peft
license: fair-noncommercial-research-license
datasets:
- yahma/alpaca-cleaned
extra_gated_fields:
  First Name: text
  Last Name: text
  Date of birth: date_picker
  Country: country
  Affiliation: text
  I accept the terms and conditions: checkbox
  geo: ip_location
language:
- en
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
---

## TamedLlama-70B-Instruct

Repository for TamedLlama-70B-Instruct, a fine-tuned variant of Llama-3.3-70B-Instruct that is robust against prompt injection attacks. See our TamedLlama paper for more information.

We also release a smaller TamedLlama-8B-Instruct model, fine-tuned from Llama-3-8B-Instruct, for use under resource-constrained settings.

## Utility Evaluation (higher is better)
| Category | Benchmark | Metric | Llama 3.3 70B Instruct | TamedLlama 70B Instruct | GPT-4o-mini | GPT-4o (2024-11-20) |
| :---- | :---- | ----- | :---- | ----- | ----- | ----- |
| General Knowledge | MMLU (0-shot, CoT) | macro\_avg/acc | 86.2 | 85.0 | 82.0<sup>[[1]](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)</sup> | 85.7<sup>[[2]](https://github.com/openai/simple-evals)</sup> |
|  | MMLU Pro (5-shot, CoT) | macro\_avg/acc | 67.8 | 67.1 | 63.1<sup>[[3]](https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro)</sup> | 77.9<sup>[[3]](https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro)</sup> |
|  | IFEval |  | 91.1 | 86.4 | - | - |
|  | BBH (3-shot, CoT) | acc | 86.2 | 85.1 | - | - |
|  | GPQA (0-shot, CoT) | acc | 62.3 | 58.5 | 40.2<sup>[[1]](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)</sup> | 46.0<sup>[[2]](https://github.com/openai/simple-evals)</sup> |
| Instruction Following | AlpacaEval2 | win_rate | 44.8 | 43.3 | 44.7 | 56.2 |
|  | SEP | win_rate | 64.9 | 62.5 | 65.9 | 64.9 |
| Agentic Workflows | AgentDojo (w/o attack) | success_rate | 56.7 | 72.2 | 67.0 | 79.4 |
|  | AgentDojo (w/ attack) | success_rate | 39.0 | 64.3 | 51.6 | 67.4 |
|  | WASP | success_rate | 48.6 | 51.4 | 27.0 | 32.4 |

## Security Evaluation (lower is better)
| Category | Benchmark | Metric | Llama 3.3 70B Instruct | TamedLlama 70B Instruct | GPT-4o-mini | GPT-4o (2024-11-20) |
| :---- | :---- | ----- | :---- | ----- | ----- | ----- |
| Instruction Following | AlpacaFarm | ASR | 94.2 | 0.0 | 0.5 | 0.0 |
|  | SEP (start) | ASR | 68.3 | 5.0 | 14.6 | 14.8 |
|  | SEP (end) | ASR | 87.1 | 2.5 | 9.1 | 14.4 |
|  | TaskTracker | ASR | 21.9 | 0.2 | 0.3 | 0.6 |
|  | CyberSecEval2 | ASR | 52.7 | 7.2 | 25.5 | 20.0 |
| Agentic Workflows | InjecAgent (base) | ASR-total | 21.7 | 1.3 | 0.9 | 18.2 |
|  | InjecAgent (enhanced) | ASR-total | 50.6 | 2.8 | 3.3 | 22.7 |
|  | AgentDojo | ASR | 14.1 | 1.3 | 11.9 | 20.4 |
|  | WASP (intermediate) | ASR | 25.0 | 2.4 | 53.6 | 17.9 |
|  | WASP (end2end) | ASR | 4.8 | 1.2 | 0.0 | 2.4 |