This is a speculative EAGLE3 model to use with Phi-4, trained on unfiltered example data using SpecForge.

Quick Example, specific to NVIDIA RTX 5090 (Blackwell):

docker run --gpus "device=0" \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=hf_HUGGINGFACETOKEN" \
    --ipc=host \
    lmsysorg/sglang:blackwell python3 -m sglang.launch_server --host 0.0.0.0 \
    --model-path dddsaty/phi-4-GPTQ-8bit \
    --mem-fraction-static 0.75 \
    --enable-torch-compile --torch-compile-max-bs 64 --cuda-graph-max-bs 64 \
    --enable-tokenizer-batch-encode \
    --enable-hierarchical-cache \
    --sampling-backend flashinfer \
    --max-total-tokens 16000 \
    --allow-auto-truncate \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered \
    --speculative-num-steps 5 \
    --speculative-eagle-topk 8 \
    --speculative-num-draft-tokens 32 \
    --port 30000

Training parameters:

Epochs: 11
Max Length: 4096
TTT Length: 8

Downloads last month: 21

Safetensors

Model size

0.6B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered

Base model

dddsaty/phi-4-GPTQ-8bit

Finetuned

(1)

this model

easiest-ai-shawn
/

Phi-4-EAGLE3-sharegpt-unfiltered

Model tree for easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered

Dataset used to train easiest-ai-shawn/Phi-4-EAGLE3-sharegpt-unfiltered