Model Card for CiscoAI/SecureBERT2.0-code-vuln-detection

The ModernBERT Code Vulnerability Detection Model is a fine-tuned variant of SecureBERT 2.0, designed to detect potential vulnerabilities in source code.
It leverages cybersecurity-aware representations learned by SecureBERT 2.0 and applies supervised fine-tuning for binary classification (vulnerable vs. non-vulnerable).

Model Details

Model Description

This model classifies source code snippets as either vulnerable or non-vulnerable using the ModernBERT architecture.
It is fine-tuned for code-level security analysis, extending the capabilities of SecureBERT 2.0.

Developed by: Cisco AI
Model type: Sequence classification
Architecture: ModernBertForSequenceClassification
Number of labels: 2
Language: English (source code tokens)
License: Apache-2.0
Finetuned from model: cisco-ai/SecureBERT2.0-base

Model Sources

Repository: https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection
Paper: arXiv:2510.00240

Uses

Direct Use

Automatic vulnerability classification for source code snippets
Static analysis pipeline integration for pre-screening code risks
Feature extraction for downstream vulnerability detection tasks

Downstream Use

Can be integrated into:

Secure code review systems
CI/CD vulnerability scanners
Security IDE extensions

Out-of-Scope Use

Non-code or natural language text classification
Runtime or dynamic vulnerability detection
Automated patch generation or remediation suggestion

Bias, Risks, and Limitations

The model may overfit to syntactic patterns from training datasets and miss logical vulnerabilities.
False negatives (missed vulnerabilities) or false positives (benign code flagged as vulnerable) may occur.
Training data may not include all programming languages or frameworks.

Recommendations

Users should use this model as an assistive tool, not as a replacement for expert manual code review.
Cross-validation with multiple tools is recommended before security-critical decisions.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Path to the model
model_dir = "cisco-ai/SecureBERT2.0-code-vuln-detection"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)

# Example input code snippet
example_code = """
static void FUNC_0(WmallDecodeCtx *VAR_0, int VAR_1, int VAR_2, int16_t VAR_3, int16_t VAR_4)
{
    int16_t icoef;
    int VAR_5 = VAR_0->cdlms[VAR_1][VAR_2].VAR_5;
    int16_t range = 1 << (VAR_0->bits_per_sample - 1);
    int VAR_6 = VAR_0->bits_per_sample > 16 ? 4 : 2; 
    if (VAR_3 > VAR_4) {
        for (icoef = 0; icoef < VAR_0->cdlms[VAR_1][VAR_2].order; icoef++)
            VAR_0->cdlms[VAR_1][VAR_2].coefs[icoef] +=
                VAR_0->cdlms[VAR_1][VAR_2].lms_updates[icoef + VAR_5];
    } else {
        for (icoef = 0; icoef < VAR_0->cdlms[VAR_1][VAR_2].order; icoef++)
            VAR_0->cdlms[VAR_1][VAR_2].coefs[icoef] -=
                VAR_0->cdlms[VAR_1][VAR_2].lms_updates[icoef];     
    }
    VAR_0->cdlms[VAR_1][VAR_2].VAR_5--;
}
"""

# Tokenize and run model
inputs = tokenizer(example_code, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=-1).item()

print(f"Predicted class ID: {predicted_class}")

Evaluation

Testing Data, Factors & Metrics

Testing Data

Internal validation split from annotated open-source vulnerability datasets.

Factors

Evaluated across:

Programming language types (C, C++, Python)
Vulnerability categories (buffer overflow, injection, logic error)

Metrics

Accuracy
Precision
Recall
F1-score

Results

Model	Accuracy	F1	Recall	Precision
CodeBERT	0.627	0.372	0.241	0.821
CyBERT	0.459	0.630	1.000	0.459
SecureBERT 2.0	0.655	0.616	0.602	0.630

Summary

SecureBERT 2.0 demonstrates the best overall balance of accuracy, F1, and precision among the compared models.
While CyBERT achieves the highest recall (detecting all vulnerabilities), it suffers from low precision, indicating many false positives.
Conversely, CodeBERT exhibits strong precision but poor recall, missing a large portion of true vulnerabilities.
SecureBERT 2.0 achieves more consistent and stable performance across all metrics, reflecting its stronger domain adaptation from cybersecurity-focused pretraining.

Environmental Impact

Hardware Type: 8× A100 GPU cluster
Hours used: [Information Not Available]
Cloud Provider: [Information Not Available]
Compute Region: [Information Not Available]
Carbon Emitted: [Estimate Not Available]

Carbon footprint can be estimated using the Machine Learning Impact Calculator.

Technical Specifications

Model Architecture and Objective

Architecture: ModernBERT (SecureBERT 2.0 backbone)
Objective: Binary classification
Max sequence length: 1024 tokens
Parameters: ~150M
Tensor type: F32

Compute Infrastructure

Framework: Transformers (PyTorch)
Precision: fp16 mixed precision
Hardware: 8 GPUs
Checkpoint Format: Safetensors

Citation

BibTeX:

@article{aghaei2025securebert,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}

Model Card Authors

Cisco AI

Model Card Contact

For inquiries, please contact [email protected]

Downloads last month: 113

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for cisco-ai/SecureBERT2.0-code-vuln-detection

Base model

answerdotai/ModernBERT-base

Finetuned

cisco-ai/SecureBERT2.0-base

Finetuned

(4)

this model