This is a speculative EAGLE3 model to use with Phi-4, trained on unfiltered example data using SpecForge.
Quick Example, specific to NVIDIA RTX 5090 (Blackwell):
docker run --gpus "device=0" \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=hf_HUGGINGFACETOKEN" \
--ipc=host \
lmsysorg/sglang:blackwell python3 -m sglang.launch_server --host 0.0.0.0 \
--model-path dddsaty/phi-4-GPTQ-8bit \
--mem-fraction-static 0.75 \
--enable-torch-compile --torch-compile-max-bs 64 --cuda-graph-max-bs 64 \
--enable-tokenizer-batch-encode \
--enable-hierarchical-cache \
--sampling-backend flashinfer \
--max-total-tokens 16000 \
--allow-auto-truncate \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered \
--speculative-num-steps 5 \
--speculative-eagle-topk 8 \
--speculative-num-draft-tokens 32 \
--port 30000
Training parameters:
- Epochs: 11
- Max Length: 4096
- TTT Length: 8
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered
Base model
dddsaty/phi-4-GPTQ-8bit