AI Detector LoRA (DeBERTa-v3-large)

LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples (label: 1 = AI, 0 = Human) using microsoft/deberta-v3-large as the base model.

  • Base model: microsoft/deberta-v3-large
  • Task: Binary classification (AI vs Human)
  • Head: Single-logit + BCEWithLogitsLoss
  • Adapter type: LoRA (peft)
  • Hardware: H100 SXM, bf16, multi-GPU
  • Final decision threshold: 0.9284 (max-F1 on calibration set)

Files in this repo

  • adapter/ – LoRA weights saved with peft_model.save_pretrained(...)
  • merged_model/ – fully merged model (base + LoRA) for standalone use
  • threshold.json – chosen deployment threshold and validation F1
  • calibration.json – temperature scaling parameters and calibration metrics
  • results.json – hyperparameters, validation threshold search, test metrics
  • training_log_history.csv – raw Trainer log history
  • predictions_calib.csv – calibration-set probabilities and labels
  • predictions_test.csv – test probabilities and labels
  • figures/ – training and evaluation plots
  • README.md – this file

Metrics (test set)

Using threshold 0.9284:

Metric Value
AUROC 0.9979
Average Precision (AP) 0.9977
F1 0.9773
Accuracy 0.9797
Precision 0.9909
Recall 0.9640
Specificity 0.9927

Confusion matrix (test):

  • True Negatives (Human correctly): 123,936
  • False Positives (Human → AI): 912
  • False Negatives (AI → Human): 3,723
  • True Positives (AI correctly): 99,816

Calibration

  • Method: temperature scaling
  • Temperature (T): 1.2807
  • Calibration set: calibration
  • Test ECE: 0.0119 → 0.0159 (after calibration)
  • Test Brier: 0.01812 → 0.01829 (after calibration)

Plots

Training & validation

  • Learning curves:

    Learning curves

  • Eval metrics over time:

    Eval metrics

Validation set

  • ROC:

    ROC (calib)

  • Precision–Recall:

    PR (calib)

  • Calibration curve:

    Calibration (calib)

  • F1 vs threshold:

    F1 vs threshold (calib)

Test set

  • ROC:

    ROC (test)

  • Precision–Recall:

    PR (test)

  • Calibration curve:

    Calibration (test)

  • Confusion matrix:

    Confusion matrix (test)


Usage

Load base + LoRA adapter

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json

base_model_id = "microsoft/deberta-v3-large"
adapter_id    = "stealthcode/ai-detection"  # or local: "./adapter"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1,  # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference with threshold

# load threshold
with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9284

def predict_proba(texts):
    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

def predict_label(texts, threshold=thr):
    probs = predict_proba(texts)
    return (probs >= threshold).astype(int)

# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels)  # label 1 = AI, 0 = Human

Load merged model (no PEFT required)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, json

model_dir = "./merged_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.eval()

with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9284

def predict_proba(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

Optional: apply temperature scaling to logits

import json
with open("calibration.json") as f:
    T = json.load(f)["temperature"]  # e.g., 1.2807

def predict_proba_calibrated(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits / T)
    return probs.cpu().numpy()

Notes

  • Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).

  • Training used:

    • bf16=True
    • optim="adamw_torch_fused"
    • cosine-with-restarts scheduler
    • LR scaled down from HPO to account for full-dataset (~14k steps).
  • Threshold 0.9284 was chosen as the max-F1 point on the calibration set. You can adjust it if you prefer fewer false positives or fewer false negatives.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stealthcode/ai-detection

Adapter
(8)
this model