llama8b-rust-qlora-phase1 (checkpoint 9000 / 12000)

This card describes checkpoint 9000 of the Phase 1 Rust QLoRA run.
For the full training plan, governance details, and final recommended checkpoints, see the root model card in the repository.

Important: This repository distributes LoRA adapter weights only, not the full meta-llama/Meta-Llama-3.1-8B-Instruct model.
To use these adapters, you must separately obtain access to the base model from Meta under the Llama 3.1 Community License and comply with Meta's license and acceptable-use policy. The adapters alone are not useful without the base model.

Model Description

This is a QLoRA fine-tuned LoRA adapter on top of meta-llama/Meta-Llama-3.1-8B-Instruct specifically trained on Rust code. The model uses 4-bit quantization with LoRA (Low-Rank Adaptation) adapters for efficient training and inference.

The primary modality is Rust code with English comments and explanations.

This checkpoint is part of the SigilDERG ecosystem and is intended as a building block for Rust-focused evaluation and governance tooling, not as a general-purpose all-domain assistant.

Training Details

Training Configuration

Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
Checkpoint: Phase 1, step 9,000 / 12,000
Effective Batch Size: 16 × 4 (effective 64 tokens-per-step equivalent)
Sequence Length: 4096
Optimizer: paged_adamw_8bit
LR Scheduler: cosine
Peak Learning Rate: ~1.52e-5 (around this checkpoint)
Warmup Steps: 250
Weight Decay: 0.0
Gradient Checkpointing: True
BF16: True
Quantization During Training: 4-bit QLoRA (NF4) with LoRA adapters

LoRA Configuration

Rank (r): 16
Alpha: 16
Dropout: 0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj

These adapters are intended to be loaded on top of the unmodified base weights.

Quantization

Method: 4-bit NF4 (BitsAndBytes)
Compute Dtype: bfloat16
Double Quantization: True

Datasets

Phase 1 was trained on:

ammarnasr/the-stack-rust-clean

Dataset configuration for this phase:

Min Length: 64
Max Length: 200000
Exclude Tests: True
Exclude Examples: False
Exclude Benches: True
Prefer Idiomatic: False
Prefer Documented: False

Phase 1 is a broad-inhale pass over cleaned Rust from The Stack. Later phases are designed to be more selective and incorporate explicit evaluation feedback.

Training Metrics (around checkpoint 9000)

Latest logged training metrics in the vicinity of this checkpoint:

loss: 0.645700
grad_norm: 0.173596
learning_rate: 1.5249989438168771e-05
entropy: 0.669227
num_tokens: 1,594,699,366
mean_token_accuracy: 0.842414
epoch: 2.777838
log_step: 9,000
checkpoint_step: 9,000
step: 9,000

Note: Logging occurs every few steps, so log_step reflects the nearest logged step to the checkpoint.

Evaluation Results

All evaluation here is based on automatic Rust-focused checks (compile, clippy, idiomatic heuristics, doc comments, prompt adherence) over a small but structured evaluation set.

Aggregate Metrics (checkpoint 9000, 50 samples)

Compilation Rate: 54.00%
Average Clippy Warnings: 0.00
Idiomatic Score: 0.1625
Documentation Rate: 0.00%
Test Rate: 0.00%

Functionality Coverage (approximate averages)

Average Functions: 3.64
Average Structs: 0.28
Average Traits: 0.02
Average Impls: 0.26

Evaluation Artifacts

Full metrics (JSONL) – per-sample evaluation:
- metrics.jsonl – compilation success, clippy warnings, idiomatic scores, doc detection, and structural stats
Error logs (JSONL) – compiler and runtime errors:
- errors.jsonl – rustc diagnostics, clippy output, and runtime error messages

(Replace these with your actual Hugging Face links as needed:)

Evaluation completed: 2025-11-20T05:03:45.292778

Governance and Intended Use

This checkpoint is part of the SigilDERG ecosystem and follows Rule Zero principles:

Primary Intended Use
- Rust code generation (functions, modules, small programs)
- Rust code explanation, refactoring, and review
- Tooling experiments for automated code evaluation, scoring, and self-improvement loops
Not Intended For
- Medical, legal, financial, or other high-stakes decision-making
- Safety-critical or life-critical systems without extensive human review
- Domains outside software engineering where the model hasn't been evaluated

Users remain responsible for:

Reviewing and testing all generated code before use in production.
Ensuring that their use of the combined base model + adapters complies with:
- Meta's Llama 3.1 Community License and acceptable-use policy.
- Any additional organizational or regulatory requirements.

This work is not affiliated with or endorsed by Meta.

Usage

Loading the Model (LoRA adapters on base)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model (requires access from Meta under the Llama 3.1 Community License)
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Load LoRA adapter (this checkpoint)
model = PeftModel.from_pretrained(
    base_model,
    "Superuser666-Sigil/Llama-3.1-8B-Instruct-Rust-QLora/checkpoint-9000"  # or your local path
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

Generation Example

# Format prompt for the instruct model
messages = [
    {"role": "system", "content": "You are a helpful Rust programming assistant."},
    {"role": "user", "content": "Write a function that calculates Fibonacci numbers."}
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Note: You must load the Meta base model first and then apply this LoRA checkpoint.
The base weights are not redistributed in this repository.

Limitations

This adapter is tuned specifically for Rust code; performance on other programming languages or general natural language tasks may be degraded relative to the base model.

The model inherits any limitations, biases, and failure modes from:

The base meta-llama/Meta-Llama-3.1-8B-Instruct model.
The training data used for Rust fine-tuning (ammarnasr/the-stack-rust-clean).

Evaluation so far is focused on:

Compilation success.
Static analysis (clippy).
Simple idiomatic and documentation heuristics.
A small prompt suite.

It should not be treated as a fully benchmarked or certified Rust expert.

Generated code should always be reviewed, tested, and security-audited (where relevant) before use.

Citation

If you use this model or its training pipeline, please cite:

@software{sigilderg_finetuner,
  title  = {SigilDERG Rust Code Fine-tuned Model},
  author = {Dave Tofflemire (Superuser666-Sigil)},
  year   = {2025},
  url    = {https://github.com/Superuser666-Sigil/SigilDERG-Finetuner}
}

You should also follow any citation or attribution requirements specified in the Llama 3.1 Community License when referencing the base model.

License

This repository combines several components with different licenses:

Base Model (not included here)

meta-llama/Meta-Llama-3.1-8B-Instruct
Licensed under the Llama 3.1 Community License by Meta.
See: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE

LoRA Adapter Weights (this checkpoint)

The adapter weights in this repository are my original contribution and are provided under the MIT License, only to the extent compatible with the Llama 3.1 Community License.
You may not use the combined base model + adapters in ways that violate Meta's license or acceptable-use policy, even though the adapter deltas themselves are MIT.

Training & Evaluation Code (SigilDERG-Finetuner, configs, scripts)

All original code in the SigilDERG ecosystem is released under the MIT License, unless otherwise noted in the specific repository.

Practical summary:

To actually run this model, you must:

Have legitimate access to meta-llama/Meta-Llama-3.1-8B-Instruct under Meta's terms.
Load these LoRA adapters on top of that base model.
Your use of the combined system (base + adapters) is governed primarily by Meta's Llama 3.1 Community License.

The MIT terms apply to the adapters and the SigilDERG code, but do not override or relax Meta's license.

This project is independent and not affiliated with or endorsed by Meta.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Superuser666-Sigil/Llama-3.1-8B-Instruct-Rust-QLora

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1322)

this model

Dataset used to train Superuser666-Sigil/Llama-3.1-8B-Instruct-Rust-QLora

Evaluation results

Compilation Rate on rust-code-evaluation
SigilDERG Evaluation

0.540
Clippy Warnings (avg) on rust-code-evaluation
SigilDERG Evaluation

0.000
Idiomatic Score on rust-code-evaluation
SigilDERG Evaluation

0.163
Documentation Rate on rust-code-evaluation
SigilDERG Evaluation

0.000
Avg Functions on rust-code-evaluation
SigilDERG Evaluation

3.640
Avg Structs on rust-code-evaluation
SigilDERG Evaluation

0.280
Avg Traits on rust-code-evaluation
SigilDERG Evaluation

0.020
Test Rate on rust-code-evaluation
SigilDERG Evaluation

0.000
Prompt Match Score on rust-code-evaluation
SigilDERG Evaluation

0.168