FALCON bi-encoder — SNORT / `e5-base-v2`

Contrastive encoder fine-tuned to map CTI text and SNORT rules into a shared embedding space. Backbone: intfloat/e5-base-v2.

Test-set metrics

split	recall@1	F1	threshold	diag mean	off-diag mean
pretrained	0.4738	0.2576	0.7030	0.8503	0.8149
run_0	0.9526	0.9017	0.6909	0.9236	0.1215
run_1	0.9551	0.9244	0.6960	0.9281	0.0744
run_2	0.9551	0.9292	0.6982	0.9251	0.0655
run_3	0.9564	0.9329	0.6951	0.9309	0.0491
run_4	0.9551	0.9324	0.7080	0.9532	0.0155

Training

Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.

Run 0 — batch=16, epochs=5, lr=2e-05, schedule=constant, T=0.05
Run 1 — batch=50, epochs=10, lr=2e-05, schedule=constant, T=0.05
Run 2 — batch=70, epochs=30, lr=2e-05, schedule=constant, T=0.05
Run 3 — batch=128, epochs=30, lr=5e-05, schedule=warmup_cosine, T=0.05
Run 4 — batch=70, epochs=50, lr=2e-05, schedule=constant, T=0.07

Loading

from transformers import AutoModel, AutoTokenizer
tok   = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")

Citation

@article{mitra2025falcon,
  title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
  author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
  journal={arXiv preprint arXiv:2508.18684},
  year={2025}
}

Downloads last month: 38

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for shaswatamitra/falcon-snort-bi-e5-base-v2

Base model

intfloat/e5-base-v2

Finetuned

(79)

this model

Collection including shaswatamitra/falcon-snort-bi-e5-base-v2

FALCON

Collection

FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection • 16 items • Updated 23 days ago

Paper for shaswatamitra/falcon-snort-bi-e5-base-v2

FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation

Paper • 2508.18684 • Published Aug 26, 2025

FALCON bi-encoder — SNORT / e5-base-v2