LuminoLex-14B (Verbarex)

LuminoLex-14B is a fine-tuned variant of Qwen3-14B, optimized for strong reasoning, mathematics, and coding performance. The model focuses on reliable zero-shot reasoning quality while remaining efficient through parameter-efficient fine-tuning.

This release is intended for research, evaluation, and advanced reasoning workloads.


Highlights

  • Strong HumanEval performance for code reasoning
  • Competitive results on ARC and HellaSwag
  • Fine-tuned using LoRA / QLoRA
  • Evaluated using BF16 compute
  • Supports both high-end and low‑VRAM inference setups
  • Designed for reasoning- and coding‑heavy workloads

Evaluation Results (lm-eval-harness)

All results are zero-shot evaluations.

Benchmark Metric Score
HumanEval pass@1 0.585
ARC-Challenge acc_norm 0.623
ARC-Easy acc_norm 0.851
HellaSwag acc_norm 0.799

These scores place LuminoLex-14B competitively among modern 14B-class open models, especially for reasoning and programming tasks.


Training Summary

LuminoLex-14B was fine-tuned on top of Qwen/Qwen3-14B using parameter-efficient LoRA adapters.

Training data

  • GSM8K — mathematical reasoning
  • MetaMathQA — structured math reasoning
  • MBPP — Python programming tasks
  • Evol-Instruct — instruction-following data

Training configuration

  • Base model: Qwen/Qwen3-14B
  • Fine-tuning method: LoRA (QLoRA-style)
  • Weight precision: 4-bit
  • Compute precision: BF16
  • Optimizer: Paged AdamW
  • Max sequence length: 1024
  • Gradient checkpointing: enabled

Installation

Install all required dependencies before running the model:

pip install -U transformers accelerate peft bitsandbytes safetensors

These are required for:

  • loading Qwen models
  • LoRA adapter support
  • 4-bit quantization
  • safe tensor loading

Usage — High‑end GPU (benchmark configuration)

This configuration matches the setup used for benchmarking. It assumes access to a high-memory GPU (e.g. A100) and uses BF16 compute without quantization.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 3. Load Base Model (Full Precision / bfloat16 for A100)
# NOTE: No quantization config here. We use torch.bfloat16.
print("Loading base model (bfloat16)...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Native A100 precision
    trust_remote_code=True
)

# 4. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Low‑VRAM Inference (8–16 GB GPUs)

This configuration uses 4‑bit NF4 quantization while preserving reasoning quality. Recommended for consumer GPUs.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Quantization Config (Enabled for Low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# 3. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 4. Load Base Model
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    quantization_config=bnb_config,
    trust_remote_code=True
)

# 5. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Notes

  • This repository contains LoRA adapter weights, not a standalone base model.
  • The base model (Qwen/Qwen3-14B) is downloaded automatically at runtime.
  • 4-bit inference uses bitsandbytes (NF4).
  • BF16 refers to compute precision, not stored weight format.
  • Benchmark results may vary slightly depending on decoding settings and hardware.

License

Apache-2.0

Downloads last month
221
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VERBAREX/LuminoLex-14B-VERBAREX

Finetuned
Qwen/Qwen3-14B
Adapter
(103)
this model