Failed to run VLLM batch inference

by Junxiao-Zhao - opened Sep 12

Sep 12

While both Instruct and Thinking worked using vllm serve, they failed to run the batch inference:

python -m vllm.entrypoints.openai.run_batch \
    -i ./qr_msgs4infer.jsonl \
    -o ./qr_infer_results.jsonl \
    --model ./Qwen3-Next-80B-A3B-Thinking \
    --tensor-parallel-size 2 \
    --max-model-len 32768 \
    --max-num-batched-tokens 32768 \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

which raised this error: TypeError: argument 'id': StreamInput must be either an integer or a list of integers

Env info

Package                           Version                         Editable project location
--------------------------------- ------------------------------- -------------------------
accelerate                        1.10.1
aiohappyeyeballs                  2.6.1
aiohttp                           3.8.6
aiohttp-cors                      0.7.0
aiosignal                         1.4.0
alabaster                         0.7.16
annotated-types                   0.7.0
antlr4-python3-runtime            4.9.3
anyio                             4.10.0
apex                              0.1
argon2-cffi                       25.1.0
argon2-cffi-bindings              25.1.0
arrow                             1.3.0
asn1crypto                        1.5.1
astor                             0.8.1
asttokens                         3.0.0
async-lru                         2.0.5
async-timeout                     4.0.3
attrs                             25.3.0
av                                15.1.0
babel                             2.17.0
beautifulsoup4                    4.13.5
bidict                            0.23.1
blake3                            1.0.5
bleach                            6.2.0
blessed                           1.21.0
blinker                           1.5
cachetools                        5.5.2
causal-conv1d                     1.5.2
cbor2                             5.7.0
certifi                           2025.7.14
cffi                              1.17.1
chardet                           5.1.0
charset-normalizer                3.4.2
click                             8.2.1
cloudpickle                       3.1.1
codetiming                        1.4.0
colorful                          0.5.7
comm                              0.2.3
compressed-tensors                0.11.0
configparser                      7.2.0
crccheck                          1.3.1
crcmod                            1.7
crypto                            1.4.1
cryptography                      43.0.3
cupy-cuda12x                      13.6.0
datasets                          4.0.0
dbus-python                       1.3.2
debugpy                           1.8.16
decorator                         5.2.1
defusedxml                        0.7.1
Deprecated                        1.2.18
depyf                             0.19.0
devscripts                        2.23.4+deb12u2
dill                              0.3.8
diskcache                         5.6.3
distlib                           0.4.0
distro                            1.8.0
distro-info                       1.5+deb12u1
dnspython                         2.7.0
docker-pycreds                    0.4.0
docutils                          0.19
ecdsa                             0.19.1
einops                            0.8.1
email-validator                   2.3.0
exceptiongroup                    1.3.0
executing                         2.2.1
fastapi                           0.116.1
fastapi-cli                       0.0.8
fastapi-cloud-cli                 0.1.5
fastjsonschema                    2.21.2
fastrlock                         0.8.3
filelock                          3.18.0
fla-core                          0.3.2
flash-attn                        2.7.4.post1
flash-linear-attention            0.3.2
flashinfer-python                 0.3.1
fqdn                              1.5.1
frozendict                        2.4.6
frozenlist                        1.3.3
fsspec                            2024.12.0
gguf                              0.17.1
gitdb                             4.0.12
GitPython                         3.1.45
google-api-core                   2.25.1
google-auth                       2.40.3
googleapis-common-protos          1.70.0
gpg                               1.18.0
gpustat                           1.0.0
grpcio                            1.59.5
h11                               0.16.0
hf-xet                            1.1.9
httpcore                          1.0.9
httplib2                          0.20.4
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.34.4
hydra-core                        1.3.2
idna                              3.10
imagesize                         1.4.1
importlib-metadata                6.7.0
iniconfig                         2.1.0
interegular                       0.3.3
iotop                             0.6
ipaddress                         1.0.23
ipykernel                         6.30.1
ipython                           9.5.0
ipython_pygments_lexers           1.1.1
ipywidgets                        8.1.7
isoduration                       20.11.0
jedi                              0.19.2
Jinja2                            3.1.6
jiter                             0.10.0
json5                             0.12.1
jsonpointer                       3.0.0
jsonschema                        4.25.1
jsonschema-specifications         2025.4.1
jupyter                           1.1.1
jupyter_client                    8.6.3
jupyter-console                   6.6.3
jupyter_core                      5.8.1
jupyter-events                    0.12.0
jupyter-lsp                       2.3.0
jupyter_server                    2.17.0
jupyter_server_terminals          0.5.3
jupyterlab                        4.4.7
jupyterlab_pygments               0.3.0
jupyterlab_server                 2.27.3
jupyterlab_widgets                3.0.15
lazr.restfulclient                0.14.5
lazr.uri                          1.0.6
linkify-it-py                     2.0.3
llguidance                        0.7.30
llvmlite                          0.44.0
lm-format-enforcer                0.11.3
markdown-it-py                    4.0.0
MarkupSafe                        3.0.2
mathruler                         0.1.0
matplotlib-inline                 0.1.7
mdit-py-plugins                   0.5.0
mdurl                             0.1.2
megatron-core                     0.13.0                          /Megatron-LM
memray                            1.18.0
mistral_common                    1.8.4
mistune                           3.1.4
mpmath                            1.3.0
msgpack                           1.0.8
msgspec                           0.19.0
multidict                         6.6.4
multiprocess                      0.70.16
Naked                             0.1.32
nbclient                          0.10.2
nbconvert                         7.16.6
nbformat                          5.10.4
nest-asyncio                      1.6.0
networkx                          3.5
ninja                             1.13.0
none                              0.1.1
notebook                          7.4.5
notebook_shim                     0.2.4
numba                             0.61.2
numpy                             2.2.6
nvidia-cublas-cu12                12.8.4.1
nvidia-cuda-cupti-cu12            12.8.90
nvidia-cuda-nvrtc-cu12            12.8.93
nvidia-cuda-runtime-cu12          12.8.90
nvidia-cudnn-cu12                 9.10.2.21
nvidia-cudnn-frontend             1.14.1
nvidia-cufft-cu12                 11.3.3.83
nvidia-cufile-cu12                1.13.1.3
nvidia-curand-cu12                10.3.9.90
nvidia-cusolver-cu12              11.7.3.90
nvidia-cusparse-cu12              12.5.8.93
nvidia-cusparselt-cu12            0.7.1
nvidia-ml-py                      13.580.82
nvidia-nccl-cu12                  2.27.3
nvidia-nvjitlink-cu12             12.8.93
nvidia-nvtx-cu12                  12.8.90
oauthlib                          3.2.2
omegaconf                         2.3.0
openai                            1.107.1
openai-harmony                    0.0.4
opencensus                        0.11.4
opencensus-context                0.1.3
opencv-python-headless            4.12.0.88
orjson                            3.11.3
outlines_core                     0.2.11
overrides                         7.7.0
packaging                         24.2
pandas                            2.3.2
pandocfilters                     1.5.1
parso                             0.8.5
partial-json-parser               0.2.1.1.post6
pathlib2                          2.3.7.post1
pathtools                         0.1.2
peft                              0.17.1
pexpect                           4.9.0
pillow                            11.3.0
pip                               25.2
platformdirs                      4.4.0
pluggy                            1.6.0
ply                               3.11
prometheus_client                 0.22.1
prometheus-fastapi-instrumentator 7.1.0
promise                           2.3
prompt_toolkit                    3.0.52
propcache                         0.3.2
proto-plus                        1.26.1
protobuf                          3.20.3
psutil                            7.0.0
ptyprocess                        0.7.0
pure_eval                         0.2.3
py                                1.11.0
py-cpuinfo                        9.0.0
py-spy                            0.4.1
pyarrow                           19.0.1
pyasn1                            0.6.1
pyasn1_modules                    0.4.2
pybase64                          1.4.2
pybind11                          3.0.1
pycountry                         24.6.1
pycparser                         2.22
pycryptodome                      3.18.0
pycryptodomex                     3.23.0
pydantic                          2.11.7
pydantic_core                     2.33.2
pydantic-extra-types              2.10.5
Pygments                          2.19.2
PyGObject                         3.42.2
PyJWT                             2.10.1
pylatexenc                        2.10
pynvml                            13.0.1
pyOpenSSL                         25.1.0
pyparsing                         3.0.9
pyrsistent                        0.20.0
pytest                            6.2.5
python-apt                        2.6.0
python-consul                     1.1.0
python-dateutil                   2.9.0.post0
python-debian                     0.1.49
python-dotenv                     1.1.1
python-engineio                   4.12.2
python-etcd                       0.4.5
python-jose                       3.5.0
python-json-logger                3.3.0
python-magic                      0.4.26
python-multipart                  0.0.20
python-socketio                   5.13.0
pytz                              2025.2
pyvers                            0.1.0
pyxdg                             0.28
PyYAML                            6.0.2
pyzmq                             27.0.2
qwen-vl-utils                     0.0.11
ray                               2.49.1
referencing                       0.36.2
regex                             2025.8.29
requests                          2.32.4
rfc3339-validator                 0.1.4
rfc3986                           2.0.0
rfc3986-validator                 0.1.1
rfc3987-syntax                    1.1.0
rich                              14.1.0
rich-toolkit                      0.15.0
rignore                           0.6.4
rpds-py                           0.27.1
rsa                               4.9.1
safetensors                       0.6.2
schedule                          1.2.2
scipy                             1.16.1
Send2Trash                        1.8.3
sentencepiece                     0.2.1
sentry-sdk                        2.35.1
setproctitle                      1.3.6
setuptools                        65.7.0
shellescape                       3.8.1
shellingham                       1.5.4
shortuuid                         1.0.13
simple-websocket                  1.1.0
six                               1.16.0
smart-open                        6.4.0
smmap                             5.0.2
sniffio                           1.3.1
snowballstemmer                   3.0.1
soundfile                         0.13.1
soupsieve                         2.8
soxr                              0.5.0.post1
Sphinx                            5.3.0
sphinxcontrib-applehelp           2.0.0
sphinxcontrib-devhelp             2.0.0
sphinxcontrib-htmlhelp            2.1.0
sphinxcontrib-jsmath              1.0.1
sphinxcontrib-qthelp              2.0.0
sphinxcontrib-serializinghtml     2.0.0
sphinxcontrib-websupport          2.0.0
stack-data                        0.6.3
starlette                         0.47.3
sympy                             1.14.0
tabulate                          0.9.0
tensordict                        0.9.1
terminado                         0.18.1
textual                           5.3.0
thriftpy2                         0.5.3
tiktoken                          0.11.0
tinycss2                          1.4.0
tokenizers                        0.22.0
toml                              0.10.2
torch                             2.8.0
torchaudio                        2.8.0
torchdata                         0.11.0
torchvision                       0.23.0
tornado                           6.5.2
tox                               3.28.0
tqdm                              4.67.1
traitlets                         5.14.3
transformer-engine                2.2.0+d0c452c
transformers                      4.57.0.dev0
triton                            3.4.0
typer                             0.17.3
types-python-dateutil             2.9.0.20250822
typing_extensions                 4.14.1
typing-inspection                 0.4.1
tzdata                            2025.2
uc-micro-py                       1.0.3
unattended-upgrades               0.1
unidiff                           0.7.3
uri-template                      1.3.0
urllib3                           1.26.20
uvicorn                           0.29.0
uvloop                            0.21.0
virtualenv                        20.34.0
vllm                              0.10.2rc3.dev7+g561a0baee.cu129
wadllib                           1.3.6
watchdog                          6.0.0
watchfiles                        0.24.0
wcwidth                           0.2.13
webcolors                         24.11.1
webencodings                      0.5.1
websocket-client                  1.8.0
websockets                        15.0.1
wheel                             0.45.1
widgetsnbextension                4.0.14
wrapt                             1.17.3
wsproto                           1.2.0
xdg                               5
xformers                          0.0.32.post1
xgrammar                          0.1.23
xxhash                            3.5.0
yarl                              1.20.1
zipp                              3.23.0

Full lof

  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/async_llm.py", line 392, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/output_processor.py", line 59, in get
    raise output
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/async_llm.py", line 462, in output_handler
    processed_outputs = output_processor.process_outputs(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/output_processor.py", line 420, in process_outputs
    stop_string = req_state.detokenizer.update(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/detokenizer.py", line 118, in update
    self.output_text += self.decode_next(new_token_id)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/detokenizer.py", line 218, in decode_next
    token = self._protected_step(next_token_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/detokenizer.py", line 240, in _protected_step
    raise e
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/detokenizer.py", line 232, in _protected_step
    token = self.stream.step(self.tokenizer, next_token_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument 'id': StreamInput must be either an integer or a list of integers

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/run_batch.py", line 497, in <module>
    asyncio.run(main(args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/run_batch.py", line 480, in main
    await run_batch(engine_client, vllm_config, args)
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/run_batch.py", line 464, in run_batch
    responses = await asyncio.gather(*response_futures)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/run_batch.py", line 284, in run_request
    response = await serving_engine_func(request.body)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 330, in create_chat_completion
    return await self.chat_completion_full_generator(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 1157, in chat_completion_full_generator
    async for res in result_generator:
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/async_llm.py", line 425, in generate
    raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError

artcmd

Sep 18

vLLM-0.10.2 was released, try it! The dev version didn't work for me, but vLLM-0.10.2 works well with cuda 12.8.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment