Instructions to use google/diffusiongemma-26B-A4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/diffusiongemma-26B-A4B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/diffusiongemma-26B-A4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("google/diffusiongemma-26B-A4B-it")
model = AutoModelForMultimodalLM.from_pretrained("google/diffusiongemma-26B-A4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use google/diffusiongemma-26B-A4B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/diffusiongemma-26B-A4B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/diffusiongemma-26B-A4B-it

SGLang

How to use google/diffusiongemma-26B-A4B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/diffusiongemma-26B-A4B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/diffusiongemma-26B-A4B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/diffusiongemma-26B-A4B-it with Docker Model Runner:
```
docker model run hf.co/google/diffusiongemma-26B-A4B-it
```

Argument input_ids not found in the forward method

by mohamedemam - opened 5 days ago

Discussion

mohamedemam

5 days ago

ValueError: Argument input_ids not found in the forward method of <class 'transformers.models.diffusion_gemma.modeling_diffusion_gemma.DiffusionGemmaDecoderModel'>

joaogante

4 days ago

Hi @mohamedemam -- could you share the snippet that produces this error, your environment, and/or the full stack trace?

infostratcomailab

4 days ago

•

edited 4 days ago

its VLLM related

mohamedemam

3 days ago

model : RedHatAI/diffusiongemma-26B-A4B-it-NVFP4

Command:

VLLM_USE_V2_MODEL_RUNNER=1
vllm serve RedHatAI/diffusiongemma-26B-A4B-it-NVFP4
--trust-remote-code
--max-num-seqs 4
--hf-overrides '{"diffusion_sampler": "entropy_bound", "diffusion_entropy_bound": 0.1}'
--default-chat-template-kwargs '{"enable_thinking": true}'

full trace:

(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344]
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344] █ █ █▄ ▄█
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.22.1
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344] █▄█▀ █ █ █ █ model RedHatAI/diffusiongemma-26B-A4B-it-NVFP4
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:344]
(APIServer pid=134096) INFO 06-13 00:13:56 [utils.py:278] non-default args: {'model_tag': 'RedHatAI/diffusiongemma-26B-A4B-it-NVFP4', 'default_chat_template_kwargs': {'enable_thinking': True}, 'model': 'RedHatAI/diffusiongemma-26B-A4B-it-NVFP4', 'trust_remote_code': True, 'hf_overrides': {'diffusion_sampler': 'entropy_bound', 'diffusion_entropy_bound': 0.1}, 'max_num_seqs': 4}
(APIServer pid=134096) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(APIServer pid=134096) INFO 06-13 00:14:48 [model.py:617] Resolved architecture: TransformersMultiModalMoEForCausalLM
(APIServer pid=134096) INFO 06-13 00:14:48 [model.py:1752] Using max model len 262144
(APIServer pid=134096) INFO 06-13 00:14:58 [vllm.py:977] Asynchronous scheduling is enabled.
(APIServer pid=134096) INFO 06-13 00:14:58 [kernel.py:270] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'], fused_add_rms_norm=['native'])
(APIServer pid=134096) INFO 06-13 00:14:58 [compilation.py:312] Enabled custom fusions: act_quant
(APIServer pid=134096) WARNING 06-13 00:15:03 [utils.py:192] TransformersMultiModalMoEForCausalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(EngineCore pid=134963) INFO 06-13 00:16:12 [core.py:112] Initializing a V1 LLM engine (v0.22.1) with config: model='RedHatAI/diffusiongemma-26B-A4B-it-NVFP4', speculative_config=None, tokenizer='RedHatAI/diffusiongemma-26B-A4B-it-NVFP4', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=compressed-tensors, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=RedHatAI/diffusiongemma-26B-A4B-it-NVFP4, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention_with_output', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::qwen_gdn_attention_core', 'vllm::gdn_attention_core_xpu', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::deepseek_v4_attention', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_vision_items_per_batch': 0, 'encoder_cudagraph_max_frames_per_batch': None, 'compile_sizes': [], 'compile_ranges_endpoints': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': True, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False, 'fuse_rope_kvcache_cat_mla': False, 'fuse_act_padding': False}, 'max_cudagraph_capture_size': 8, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': False, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native'], fused_add_rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto', linear_backend='auto')
(EngineCore pid=134963) WARNING 06-13 00:16:13 [utils.py:192] TransformersMultiModalMoEForCausalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(EngineCore pid=134963) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(EngineCore pid=134963) INFO 06-13 00:16:17 [parallel_state.py:1422] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.18.0.2:34255 backend=nccl
(EngineCore pid=134963) INFO 06-13 00:16:17 [parallel_state.py:1735] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0, EPLB rank N/A
(EngineCore pid=134963) INFO 06-13 00:16:23 [topk_topp_sampler.py:45] Using FlashInfer for top-p & top-k sampling.
(EngineCore pid=134963) INFO 06-13 00:16:42 [gpu_model_runner.py:5037] Starting to load model RedHatAI/diffusiongemma-26B-A4B-it-NVFP4...
(EngineCore pid=134963) INFO 06-13 00:16:42 [base.py:117] Using Transformers modeling backend.
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] EngineCore failed to start.
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] Traceback (most recent call last):
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1139, in run_engine_core
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] return func(*args, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 905, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] super().init(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 121, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self.model_executor = executor_class(vllm_config)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] return func(*args, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 109, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self._init_executor()
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 65, in _init_executor
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self.driver_worker.load_model()
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 342, in load_model
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] return func(*args, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5053, in load_model
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self.model = model_loader.load_model(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] return func(*args, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] model = initialize_model(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] return func(*args, **kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 61, in initialize_model
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/moe.py", line 128, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] super(MixtureOfExperts, self).init(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/multimodal.py", line 271, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] super(SupportsMRoPE, self).init(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/causal.py", line 35, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] super(VllmModelForTextGeneration, self).init(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 167, in init
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self._decorate_for_torch_compile(**from_config_kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/multimodal.py", line 310, in _decorate_for_torch_compile
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] super()._decorate_for_torch_compile(**kwargs)
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 285, in _decorate_for_torch_compile
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] self._decorate_cls_for_torch_compile(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 270, in _decorate_cls_for_torch_compile
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] support_torch_compile(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 235, in cls_decorator_helper
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] raise ValueError(
(EngineCore pid=134963) ERROR 06-13 00:16:43 [core.py:1165] ValueError: Argument input_ids not found in the forward method of <class 'transformers.models.diffusion_gemma.modeling_diffusion_gemma.DiffusionGemmaDecoderModel'>
(EngineCore pid=134963) Process EngineCore:
(EngineCore pid=134963) Traceback (most recent call last):
(EngineCore pid=134963) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=134963) self.run()
(EngineCore pid=134963) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=134963) self._target(*self._args, **self._kwargs)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1169, in run_engine_core
(EngineCore pid=134963) raise e
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1139, in run_engine_core
(EngineCore pid=134963) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) return func(*args, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 905, in init
(EngineCore pid=134963) super().init(
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 121, in init
(EngineCore pid=134963) self.model_executor = executor_class(vllm_config)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) return func(*args, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 109, in init
(EngineCore pid=134963) self._init_executor()
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 65, in _init_executor
(EngineCore pid=134963) self.driver_worker.load_model()
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 342, in load_model
(EngineCore pid=134963) self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) return func(*args, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5053, in load_model
(EngineCore pid=134963) self.model = model_loader.load_model(
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) return func(*args, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=134963) model = initialize_model(
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=134963) return func(*args, **kwargs)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 61, in initialize_model
(EngineCore pid=134963) model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/moe.py", line 128, in init
(EngineCore pid=134963) super(MixtureOfExperts, self).init(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/multimodal.py", line 271, in init
(EngineCore pid=134963) super(SupportsMRoPE, self).init(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/causal.py", line 35, in init
(EngineCore pid=134963) super(VllmModelForTextGeneration, self).init(
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 167, in init
(EngineCore pid=134963) self._decorate_for_torch_compile(**from_config_kwargs)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/multimodal.py", line 310, in _decorate_for_torch_compile
(EngineCore pid=134963) super()._decorate_for_torch_compile(**kwargs)
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 285, in _decorate_for_torch_compile
(EngineCore pid=134963) self._decorate_cls_for_torch_compile(
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/model_executor/models/transformers/base.py", line 270, in _decorate_cls_for_torch_compile
(EngineCore pid=134963) support_torch_compile(
(EngineCore pid=134963) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 235, in cls_decorator_helper
(EngineCore pid=134963) raise ValueError(
(EngineCore pid=134963) ValueError: Argument input_ids not found in the forward method of <class 'transformers.models.diffusion_gemma.modeling_diffusion_gemma.DiffusionGemmaDecoderModel'>
[rank0]:[W613 00:16:44.766518897 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=134096) Traceback (most recent call last):
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/bin/vllm", line 10, in
(APIServer pid=134096) sys.exit(main())
(APIServer pid=134096) ^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=134096) args.dispatch_function(args)
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 148, in cmd
(APIServer pid=134096) uvloop.run(run_server(args))
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=134096) return __asyncio.run(
(APIServer pid=134096) ^^^^^^^^^^^^^^
(APIServer pid=134096) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=134096) return runner.run(main)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=134096) return self._loop.run_until_complete(task)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=134096) return await main
(APIServer pid=134096) ^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 678, in run_server
(APIServer pid=134096) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 692, in run_server_worker
(APIServer pid=134096) async with build_async_engine_client(
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=134096) return await anext(self.gen)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=134096) async with build_async_engine_client_from_engine_args(
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=134096) return await anext(self.gen)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=134096) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=134096) return cls(
(APIServer pid=134096) ^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 146, in init
(APIServer pid=134096) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=134096) return func(*args, **kwargs)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 131, in make_async_mp_client
(APIServer pid=134096) return AsyncMPClient(*client_args)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=134096) return func(*args, **kwargs)
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 932, in init
(APIServer pid=134096) super().init(
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 567, in init
(APIServer pid=134096) with launch_core_engines(
(APIServer pid=134096) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134096) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=134096) next(self.gen)
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1150, in launch_core_engines
(APIServer pid=134096) wait_for_engine_startup(
(APIServer pid=134096) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1209, in wait_for_engine_startup
(APIServer pid=134096) raise RuntimeError(
(APIServer pid=134096) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

ENV

--2026-06-13 00:10:38-- https://raw.githubusercontent.com/vllm-project/vllm/main/vllm/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35090 (34K) [text/plain]
Saving to: ‘collect_env.py.2’

collect_env.py.2 100%[===============================================================================================>] 34.27K --.-KB/s in 0.005s

2026-06-13 00:10:38 (6.54 MB/s) - ‘collect_env.py.2’ saved [35090/35090]

Collecting environment information...

    System Info

==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version : Could not collect
CMake version : version 3.22.1
Libc version : glibc-2.35

==============================
PyTorch Info

PyTorch version : 2.11.0+cu130
Is debug build : False
CUDA used to build PyTorch : 13.0
ROCM used to build PyTorch : N/A
XPU used to build PyTorch : N/A

==============================
Python Environment

Python version : 3.12.13 (main, Mar 4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-6.14.0-33-generic-x86_64-with-glibc2.35

   CUDA / GPU Info

==============================
Is CUDA available : True
CUDA runtime version : 13.0.88
CUDA_MODULE_LOADING set to :
GPU models and configuration : GPU 0: NVIDIA L4
Nvidia driver version : 580.126.09
cuDNN version : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_tensor_ir.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_ext.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.23.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.23.0
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True

==============================
CPU Info

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7542 32-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
Stepping: 0
Frequency boost: enabled
CPU max MHz: 3408.1079
CPU min MHz: 1500.0000
BogoMIPS: 5800.09
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization: AMD-V
L1d cache: 2 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 256 MiB (16 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

==============================
Versions of relevant libraries

[pip3] flashinfer-python==0.6.11.post2
[pip3] numpy==2.4.6
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cccl==13.3.3.3.1
[pip3] nvidia-cuda-crt==13.3.33
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvcc==13.3.33
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-cufile==1.15.1.6
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-cutlass-dsl==4.5.2
[pip3] nvidia-cutlass-dsl-libs-base==4.5.2
[pip3] nvidia-cutlass-dsl-libs-cu13==4.5.2
[pip3] nvidia-ml-py==13.610.43
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvshmem-cu13==3.4.5
[pip3] nvidia-nvtx==13.0.85
[pip3] nvidia-nvvm==13.3.33
[pip3] pyzmq==27.1.0
[pip3] tokenspeed-triton==3.7.10.post20260531
[pip3] torch==2.11.0+cu130
[pip3] torch_c_dlpack_ext==0.1.5
[pip3] torchaudio==2.11.0+cu130
[pip3] torchvision==0.26.0+cu130
[pip3] transformers==5.13.0.dev0
[pip3] triton==3.6.0
[conda] Could not collect

==============================
vLLM Info

ROCM Version : Could not collect
vLLM Version : 0.22.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
GPU0 NIC0 NIC1 NIC2 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS PHB 0-31,64-95 0 N/A
NIC0 SYS X PIX SYS
NIC1 SYS PIX X SYS
NIC2 PHB SYS SYS X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_2
NIC1: mlx5_3
NIC2: mlx5_bond_0

==============================
Environment Variables

NVIDIA_VISIBLE_DEVICES=void
NVIDIA_REQUIRE_CUDA=cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536
NCCL_VERSION=2.21.5-1
NVIDIA_DRIVER_CAPABILITIES=compute,display,graphics,utility,video
NVIDIA_PRODUCT_NAME=CUDA
CUDA_VERSION=12.4.1
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_CTK_LIBCUDA_DIR=/usr/lib/x86_64-linux-gnu
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Argument input_ids not found in the forward method

model : RedHatAI/diffusiongemma-26B-A4B-it-NVFP4

Command:

full trace:

ENV

Collecting environment information...

============================== PyTorch Info

============================== Python Environment

Python version : 3.12.13 (main, Mar 4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)Python platform : Linux-6.14.0-33-generic-x86_64-with-glibc2.35

============================== CPU Info

==============================Versions of relevant libraries

============================== vLLM Info

============================== Environment Variables