LuminoLex-14B (Verbarex)

LuminoLex-14B is a fine-tuned variant of Qwen3-14B, optimized for strong reasoning, mathematics, and coding performance. The model focuses on reliable zero-shot reasoning quality while remaining efficient through parameter-efficient fine-tuning.

This release is intended for research, evaluation, and advanced reasoning workloads.

Highlights

Strong HumanEval performance for code reasoning
Competitive results on ARC and HellaSwag
Fine-tuned using LoRA / QLoRA
Evaluated using BF16 compute
Supports both high-end and low‑VRAM inference setups
Designed for reasoning- and coding‑heavy workloads

Evaluation Results (lm-eval-harness)

All results are zero-shot evaluations.

Benchmark	Metric	Score
HumanEval	pass@1	0.585
ARC-Challenge	acc_norm	0.623
ARC-Easy	acc_norm	0.851
HellaSwag	acc_norm	0.799

These scores place LuminoLex-14B competitively among modern 14B-class open models, especially for reasoning and programming tasks.

Training Summary

LuminoLex-14B was fine-tuned on top of Qwen/Qwen3-14B using parameter-efficient LoRA adapters.

Training data

GSM8K — mathematical reasoning
MetaMathQA — structured math reasoning
MBPP — Python programming tasks
Evol-Instruct — instruction-following data

Training configuration

Base model: Qwen/Qwen3-14B
Fine-tuning method: LoRA (QLoRA-style)
Weight precision: 4-bit
Compute precision: BF16
Optimizer: Paged AdamW
Max sequence length: 1024
Gradient checkpointing: enabled

Installation

Install all required dependencies before running the model:

pip install -U transformers accelerate peft bitsandbytes safetensors

These are required for:

loading Qwen models
LoRA adapter support
4-bit quantization
safe tensor loading

Usage — High‑end GPU (benchmark configuration)

This configuration matches the setup used for benchmarking. It assumes access to a high-memory GPU (e.g. A100) and uses BF16 compute without quantization.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 3. Load Base Model (Full Precision / bfloat16 for A100)
# NOTE: No quantization config here. We use torch.bfloat16.
print("Loading base model (bfloat16)...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Native A100 precision
    trust_remote_code=True
)

# 4. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Low‑VRAM Inference (8–16 GB GPUs)

This configuration uses 4‑bit NF4 quantization while preserving reasoning quality. Recommended for consumer GPUs.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Quantization Config (Enabled for Low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# 3. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 4. Load Base Model
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    quantization_config=bnb_config,
    trust_remote_code=True
)

# 5. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Notes

This repository contains LoRA adapter weights, not a standalone base model.
The base model (Qwen/Qwen3-14B) is downloaded automatically at runtime.
4-bit inference uses bitsandbytes (NF4).
BF16 refers to compute precision, not stored weight format.
Benchmark results may vary slightly depending on decoding settings and hardware.

License

Apache-2.0

Downloads last month: 221

Model tree for VERBAREX/LuminoLex-14B-VERBAREX

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(103)

this model