Model Card for Vijil Prompt Injection
Model Details
Model Description
This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
- Developed by: Vijil AI
- License: apache-2.0
- Finetuned version of ModernBERT
Uses
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") 
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
classifier = pipeline(
  "text-classification",
  model=model,
  tokenizer=tokenizer,
  truncation=True,
  max_length=512,
  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)
print(classifier("this is a prompt-injection prompt"))
Training Details
Training Data
The dataset used for training the model was taken from
wildguardmix/train and safe-guard-prompt-injection/train
Training Procedure
Supervised finetuning with above dataset
Training Hyperparameters
- learning_rate: 5e-05 
- train_batch_size: 32 
- eval_batch_size: 32 
- optimizer: adamw_torch_fused 
- lr_scheduler_type: cosine_with_restarts 
- warmup_ratio: 0.1 
- num_epochs: 3 
Evaluation
- Training Loss: 0.0036 
- Validation Loss: 0.209392 
- Accuracy: 0.961538 
- Precision: 0.958362 
- Recall: 0.957055 
- Fl: 0.957708 
Testing Data
The dataset used for training the model was taken from
wildguardmix/test and safe-guard-prompt-injection/test
Results
Model Card Contact
- Downloads last month
- 2,202
