Tibetan Outline Boundary CRF (Full)
Production CRF model for Tibetan text boundary detection, trained on the full snippet corpus (82,560 snippets).
Evaluation (unique benchmark, 31,591 snippets, ±15 chars)
| Precision | Recall | F1 |
|---|---|---|
| 0.718 | 0.475 | 0.571 |
Note: trained including unique-benchmark strings — use for production recall, not unbiased benchmark reporting.
Related repos
- Training data: ganga4364/tibetan-outline-boundary-snippets-full
- Unbiased model: ganga4364/tibetan-outline-boundary-crf-unbiased
Usage
pip install "outline-detection[crf]"
hf download ganga4364/tibetan-outline-boundary-crf-full boundary_crf.pkl --local-dir ./models
outline-detect crf predict input.txt --model ./models/boundary_crf.pkl -o output.txt
from outline_detection.crf import CRFBoundaryDetector
det = CRFBoundaryDetector()
det.load("boundary_crf.pkl")
positions = det.predict_positions(clean_tibetan_text)
Security
Pickle files load executable objects. Only use models from this trusted repository.