This is a speculative EAGLE3 model to use with Phi-4, trained on unfiltered example data using SpecForge.

Quick Example, specific to NVIDIA RTX 5090 (Blackwell):

docker run --gpus "device=0" \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=hf_HUGGINGFACETOKEN" \
    --ipc=host \
    lmsysorg/sglang:blackwell python3 -m sglang.launch_server --host 0.0.0.0 \
    --model-path dddsaty/phi-4-GPTQ-8bit \
    --mem-fraction-static 0.75 \
    --enable-torch-compile --torch-compile-max-bs 64 --cuda-graph-max-bs 64 \
    --enable-tokenizer-batch-encode \
    --enable-hierarchical-cache \
    --sampling-backend flashinfer \
    --max-total-tokens 16000 \
    --allow-auto-truncate \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered \
    --speculative-num-steps 5 \
    --speculative-eagle-topk 8 \
    --speculative-num-draft-tokens 32 \
    --port 30000

Training parameters:

  • Epochs: 11
  • Max Length: 4096
  • TTT Length: 8
Downloads last month
21
Safetensors
Model size
0.6B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered

Finetuned
(1)
this model

Dataset used to train easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered