What problem are you having? I might be able to help.

by win10 - opened 24 days ago

Discussion

win10

24 days ago

What problem are you having? I might be able to help.

Firworks

Owner 24 days ago

•

edited 24 days ago

On my test system running:

docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host  vllm/vllm-openai:nightly --model mistralai/Devstral-Small-2-24B-Instruct-2512  --dtype auto  --max-model-len 32768  --tool-call-parser mistral  --enable-auto-tool-choice

Will load and work, but running:

docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host  vllm/vllm-openai:nightly --model Firworks/Devstral-Small-2-24B-Instruct-2512-nvfp4  --dtype auto  --max-model-len 32768  --tool-call-parser mistral  --enable-auto-tool-choice

Fails claiming:

(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__ (APIServer pid=1) raise KeyError(key) (APIServer pid=1) KeyError: 'ministral3'

I've gone over and over the config and other files between the two repos trying to figure out what is the matter and can't find what is busted. It's acting like the version of Transformers in the second command doesn't support ministral3 architecture models but it's literally the same exact container image.

I went round and round troubleshooting this with GPT 5.1 and it basically just gave up.

win10

24 days ago

On my test system running:
docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host  vllm/vllm-openai:nightly --model mistralai/Devstral-Small-2-24B-Instruct-2512  --dtype auto  --max-model-len 32768  --tool-call-parser mistral  --enable-auto-tool-choice
Will load and work, but running:
docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host  vllm/vllm-openai:nightly --model Firworks/Devstral-Small-2-24B-Instruct-2512-nvfp4  --dtype auto  --max-model-len 32768  --tool-call-parser mistral  --enable-auto-tool-choice
Fails claiming:
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__ (APIServer pid=1) raise KeyError(key) (APIServer pid=1) KeyError: 'ministral3'
I've gone over and over the config and other files between the two repos trying to figure out what is the matter and can't find what is busted. It's acting like the version of Transformers in the second command doesn't support ministral3 architecture models but it's literally the same exact container image.

I went round and round troubleshooting this with GPT 5.1 and it basically just gave up.

You can try updating the VLLM and transformer versions:

Install transformers from source

Use pip install --upgrade git+https://github.com/huggingface/transformers.git

milizhang

23 days ago

•

edited 23 days ago

mistralai/Devstral-Small-2-24B-Instruct-2512 works because it uses Mistral's configuration format (with params.json), so it does not have the issue of missing ministral3.
For your quant, sadly I think the only solution is to bump transformers, as vLLM does not seem to handle NVFP4 for Mistral-style config.

See:
https://github.com/vllm-project/vllm/blob/f4417f8449dc7a2cb890dbef659c0d1ce93432da/vllm/transformers_utils/config.py#L557
https://github.com/vllm-project/vllm/blob/a11f4a81e027efd9ef783b943489c222950ac989/vllm/transformers_utils/configs/mistral.py#L164

Firworks

Owner 23 days ago

Unfortunate. Maybe I'll write something up and post an issue to VLLM and see if they have any ideas or if we can get some fix in the pipeline. I wonder if there's still value running the larger Devstral while I've got the quant script working.

MidnightPhreaker

23 days ago

Hey just to let you know, Ive been having the same problem doing nvfp4. I've redone the quant about 10 times, the final time I thought I had it as I requantizednit using LlavaGenerationConfig but then I gave up at a type error of Nonetype is not iterable or something along those lines. I'm using the AWQ at the moment but it seems the vision stuff in the AWQ is messed up.

milizhang

23 days ago

Tried to override the package version - it seems like there are API changes in transformers that vLLM needs to pick up, so there is real work that needs to happen - and I am not sure if that means we need to wait after transformers 5.0.0 to go out of RC.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment