DuckyBlender/diegogpt-v2-mlx-bf16
This model DuckyBlender/diegogpt-v2-mlx-bf16 is a full fine-tune of Qwen/Qwen3-0.6B-MLX-bf16, trained on the complete set of public replies from a specific individual.
Training was conducted using mlx-lm version 0.26.0. It ran for 15 steps with a batch size of 16, completing in a few seconds on a MacBook Pro M1 Pro (8-core CPU, 16GB RAM). Peak memory usage was 8.3GB. The dataset contained 225 low-quality training pairs (240 lines trained total).
Run with system prompt /no_think and the following generation parameters:
--temp 0.7--top-p 0.8--top-k 20--min-p 0
Example usage:
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DuckyBlender/diegogpt-v2-mlx-bf16")
prompt = "are you red hat hacker?"
if tokenizer.chat_template is not None:
messages = [
{"role": "user", "content": user_input}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=False)
else:
prompt = user_input
sampler = make_sampler(temp=0.7, top_p=0.8, top_k=20, min_p=0)
response = mlx_lm.generate(
model,
tokenizer,
prompt=prompt,
sampler=sampler,
verbose=True
)
Or directly via CLI:
mlx_lm.generate \
--model "DuckyBlender/diegogpt-v2-mlx-bf16" \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0 \
--system "/no_think" \
--prompt "are you red hat hacker?"
Model uses ~1.25GB RAM during inference.
- Downloads last month
- 415