Reynier
/

Llama3_8B-DGA-Detector

Model card Files Files and versions

Llama3_8B-DGA-Detector / README.md

Reynier's picture

Update README.md

ee38aec verified 12 months ago

|

history blame contribute delete

1.63 kB

	---
	base_model:
	- meta-llama/Meta-Llama-3-8B
	---

	## Llama3 8B Fine-Tuned for Domain Generation Algorithm Detection

	This model is a fine-tuned version of Meta's Llama3 8B, specifically adapted for detecting Domain Generation Algorithms (DGAs). DGAs are often used by malware to create dynamic domain names for command-and-control (C&C) servers, making them a critical challenge in cybersecurity.

	## Model Description

	- Base Model: Llama3 8B
	- Task: DGA Detection
	- Fine-Tuning Approach: Supervised Fine-Tuning (SFT) with domain-specific data.
	- Dataset: A custom dataset comprising 68 malware families and legitimate domains from the Tranco dataset, with a focus on both arithmetic and word-based DGAs.
	- Performance:
	- Accuracy: 94%
	- False Positive Rate (FPR): 4%
	- Excels in detecting hard-to-identify word-based DGAs.

	This model leverages the extensive semantic understanding of Llama3 to classify domains as either malicious (DGA-generated) or legitimate with high precision and recall.

	## Data

	The model was trained with 2 million domains, split between 1 million DGA domains and 1 million normal domains. The training data is stored in the file train_2M.csv. The model was evaluated with the family files located in the Families_Test folder.

	The GitHub repository https://github.com/reypapin/Domain-Name-Classification-with-LLM contains the notebooks that describe how the model was trained and evaluated.


	## Article Reference

	La O, R. L., Catania, C. A., & Parlanti, T. (2024). LLMs for Domain Generation Algorithm Detection. arXiv preprint arXiv:2411.03307.