Developed By Kashif Salahuddin & Samiya Kashif

1. Executive Summary

AskBuddyX is a specialized coding assistant (MVP version) built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, AskBuddyX implements a "runnable-first" philosophy: when users request code, responses are structured with clear Solution, Usage, and Sanity test sections, ensuring developers receive immediately executable code with minimal friction. It uses the philosphy of minimize lines of code while preserving behavior

What AskBuddyX Is

A LoRA adapter Trained on code-alpaca-20k dataset
OpenAI-compatible API for local inference
Lightweight distribution (~12MB adapter vs. multi-GB full models)
Production-engineered with automated pipelines, evaluation, and publishing

Why AskBuddyX

AskBuddyX is built for a simple, practical goal: deliver the same outcome with fewer lines of code.

Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide.

Too Much Code, Too Fast Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing 5 to 7 times more than before. At first, it looks like higher productivity. In reality, it often means more bugs.

There’s a long-standing rule in software engineering:

“The more lines of code you have, the higher your probability of introducing bugs.”

The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be verbose and repetitive, which can inflate LOC without adding real value.

AskBuddyX is designed for teams that value minimalism, clarity, and correctness over volume.

What makes AskBuddyX different

Minimal LoC by default AskBuddyX is optimized to minimize lines of code while preserving behavior—it prefers the smallest correct solution that meets the user’s objective.
Internal governance behavior The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce minimalism as a constraint. It evaluates candidate solutions by measuring lines of code and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines without sacrificing correctness.
Practical, runnable output When you ask for code, AskBuddyX is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate.

Early validation

AskBuddyX was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, AskBuddyX showed a clear reduction in lines of code (up to ~30%) while producing solutions that executed correctly and achieved the same intended outcomes under the evaluation harness.

Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards.

Why It Exists

Developers need coding assistance that:

Provides runnable code immediately without extensive explanation
Runs locally without cloud dependencies
Maintains small footprint for fast iteration
Offers structured, predictable responses for automation

Who It's For

Individual developers working on their individual projects.
Small teams needing local, private coding assistance
Educators teaching programming with consistent code examples
Researchers experimenting with LoRA fine-tuning on MLX

Quick Start

Option 1: Use with MLX

Install MLX and load the model with adapter:

pip install mlx-lm

from mlx_lm import load, generate

# Load base model with AskBuddyX adapter
model, tokenizer = load(
    "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
    adapter_path="salakash/AskBuddyX"
)

# Generate code
prompt = "Write a Python function to calculate factorial"
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Option 2: Use with Transformers

pip install transformers torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-0.5B-Instruct",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(base_model, "salakash/AskBuddyX")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Generate
messages = [{"role": "user", "content": "Write a Python function to add two numbers"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option 3: Web UI with MLX

Start an OpenAI-compatible server:

# Install mlx-lm if not already installed
pip install mlx-lm

# Start server with adapter
mlx_lm.server \
  --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
  --adapter-path salakash/AskBuddyX \
  --port 8080

Then use with any OpenAI-compatible client:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
    "messages": [
      {"role": "user", "content": "Write a Python function to reverse a string"}
    ],
    "max_tokens": 512
  }'

Or use with any OpenAI-compatible web UI like:

Configure the UI to point to http://localhost:8080 as the API endpoint.

Option 4: Hugging Face Inference API

Use directly via Hugging Face's Inference API (requires HF token):

import requests

API_URL = "https://api-inference.huggingface.co/models/salakash/AskBuddyX"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "Write a Python function to check if a number is prime",
    "parameters": {"max_new_tokens": 256}
})
print(output)

Response Format

AskBuddyX provides structured, runnable-first responses:

Solution: The main implementation code
Usage: A minimal runnable example
Sanity test: A tiny test snippet (when appropriate)

Comparison

AskBuddyX achieved the same objective in ~8-10 lines of code, while a standard LLM typically produced 22–26 lines for the equivalent solution.