π§π· pt-ai-detector
pt-ai-detector is a BERT-base model fine-tuned to decide whether a Portuguese sentence or paragraph was written by a human (label = 0) or generated by AI (label = 1).
| Metric | Value | 
|---|---|
| Train data | 1 000 000 human + 1 000 000 AI | 
| Balanced test set | 1 954 190 (Β½ human, Β½ AI) | 
| Accuracy | β 99 % | 
| F1 (macro) | β 0.99 | 
| Model size | 434 M parameters (β 430 MB) | 
π Quick usage
Try it live at detecting-ai.com β our team at Detecting-ai built this model and demo so you can instantly test any Portuguese text online.
from transformers import pipeline
clf = pipeline(
    "text-classification",
    model="Detecting-ai/pt-ai-detector",
    tokenizer="Detecting-ai/pt-ai-detector",
    device=0  # set -1 for CPU
)
text = "A inteligΓͺncia artificial estΓ‘ transformando a educaΓ§Γ£o."
print(clf(text))        # β [{'label': 'AI', 'score': 0.987}]
| id | label | 
|---|---|
| 0 | Human | 
| 1 | AI | 
ποΈββοΈ Training details
- Base model: neuralmind/bert-base-portuguese-cased
- Epochs: 3 (fp16 on 1 Γ A100)
- Batch size: 32
- Optimizer/LR: AdamW 2 Γ 10β»β΅
- Loss: Cross-entropy
Data sources
| Corpus | Lines used | Notes | 
|---|---|---|
| Human corpus (wiki40b-pt, oscar-pt, cc100-pt, europarl-pt, opus-books-pt) | 1 M sampled | Diverse Portuguese web/news/books | 
| AI corpus ( Detecting-ai/ai_pt_corpus) | 1 M | Generated with OpenAI models (various GPT-4 / GPT-3.5 variants); prompts cover essays, news, tweets, dialogs, paraphrases, T = 0.6β1.0 | 
Datasets were balanced 1 : 1 and shuffled before training.
π¦ Intended use
Detect AI-generated Portuguese text in essays, articles, chats, support tickets, etc.
Limitations
- Not trained on code or non-Portuguese language.
- Accuracy may drop on texts < 10 tokens or heavily paraphrased AI.
- Commercial use is disallowed (CC-BY-NC-4.0).
β οΈ Future work
- Evaluate on adversarial paraphrases.
- Distill/quantize for edge deployment.
- Extend to multilingual detection.
π Citation
@misc{abdurazzoqov2025ptaidetector,
  title   = {pt-ai-detector: Detecting AI-generated Portuguese Text},
  author  = {Abdulla Abdurazzoqov},
  year    = {2025},
  howpublished = {Hugging Face hub},
  url     = {https://huggingface.co/Detecting-ai/pt-ai-detector}
}
π¬ License
Creative Commons CC-BY-NC-4.0 β free for research & personal use; commercial use requires written permission.
- Downloads last month
- 23
