Can't launch the model via docker
#51
by
						
xne0n
	
							
						- opened
							
					
Hello PaddleOCR Team,
First, I want to say thank you for your outstanding work on the PaddleOCR project, especially the new PaddleOCR-VL model. It's an incredibly powerful tool.
I'm writing to report a potential bug I encountered while using PaddleOCR-VL inside you docker image.
$ sudo docker run --rm --gpus device=2 --network host --runtime nvidia  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server
Using official model (PaddleOCR-VL), the model files will be automatically downloaded and saved in `/root/.paddlex/official_models/PaddleOCR-VL`.
Fetching 22 files: 100%|ββββββββββ| 22/22 [00:16<00:00,  1.31it/s]
INFO 10-29 20:28:58 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=1) INFO 10-29 20:28:59 [api_server.py:1896] vLLM API server version 0.10.2
(APIServer pid=1) INFO 10-29 20:28:59 [utils.py:328] non-default args: {'api_server_count': 4, 'host': '0.0.0.0', 'port': 8080, 'chat_template': '/usr/local/lib/python3.10/site-packages/paddlex/inference/genai/chat_templates/PaddleOCR-VL-0.9B.jinja', 'model': '/root/.paddlex/official_models/PaddleOCR-VL', 'trust_remote_code': True, 'max_model_len': 16384, 'served_model_name': ['PaddleOCR-VL-0.9B'], 'gpu_memory_utilization': 0.5, 'max_num_batched_tokens': 131072}
(APIServer pid=1) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1) INFO 10-29 20:28:59 [__init__.py:742] Resolved architecture: PaddleOCRVLForConditionalGeneration
(APIServer pid=1) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=1) INFO 10-29 20:28:59 [__init__.py:1815] Using max model len 16384
(APIServer pid=1) INFO 10-29 20:28:59 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=131072.
INFO 10-29 20:29:06 [__init__.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=153) INFO 10-29 20:29:07 [core.py:654] Waiting for init message from front-end.
(EngineCore_DP0 pid=153) INFO 10-29 20:29:07 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model='/root/.paddlex/official_models/PaddleOCR-VL', speculative_config=None, tokenizer='/root/.paddlex/official_models/PaddleOCR-VL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=PaddleOCR-VL-0.9B, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     self._init_executor()
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     self.collective_rpc("init_worker", args=([kwargs], ))
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     return func(*args, **kwargs)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 564, in init_worker
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     worker_class = resolve_obj_by_qualname(
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/utils/__init__.py", line 2618, in resolve_obj_by_qualname
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     module = importlib.import_module(module_name)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     return _bootstrap._gcd_import(name[level:], package, level)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 34, in <module>
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 82, in <module>
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     from vllm.v1.spec_decode.eagle import EagleProposer
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 23, in <module>
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     from vllm.v1.attention.backends.flash_attn import FlashAttentionMetadata
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/attention/backends/flash_attn.py", line 155, in <module>
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     class FlashAttentionMetadataBuilder(
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/v1/attention/backends/flash_attn.py", line 176, in FlashAttentionMetadataBuilder
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     if get_flash_attn_version() == 3 else AttentionCGSupport.UNIFORM_BATCH
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/attention/utils/fa_utils.py", line 56, in get_flash_attn_version
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     if not is_fa_version_supported(fa_version):
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 55, in is_fa_version_supported
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     return _is_fa2_supported(device)[0]
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 35, in _is_fa2_supported
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     if torch.cuda.get_device_capability(device)[0] < 8:
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 600, in get_device_capability
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     prop = get_device_properties(device)
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 616, in get_device_properties
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     _lazy_init()  # will define _get_device_properties
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 412, in _lazy_init
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718]     torch._C._cuda_init()
(EngineCore_DP0 pid=153) ERROR 10-29 20:29:07 [core.py:718] RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
(EngineCore_DP0 pid=153) Process EngineCore_DP0:
(EngineCore_DP0 pid=153) Traceback (most recent call last):
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=153)     self.run()
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=153)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=153)     raise e
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=153)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__
(EngineCore_DP0 pid=153)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_DP0 pid=153)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=153)     self._init_executor()
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
(EngineCore_DP0 pid=153)     self.collective_rpc("init_worker", args=([kwargs], ))
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=153)     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method
(EngineCore_DP0 pid=153)     return func(*args, **kwargs)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 564, in init_worker
(EngineCore_DP0 pid=153)     worker_class = resolve_obj_by_qualname(
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/utils/__init__.py", line 2618, in resolve_obj_by_qualname
(EngineCore_DP0 pid=153)     module = importlib.import_module(module_name)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
(EngineCore_DP0 pid=153)     return _bootstrap._gcd_import(name[level:], package, level)
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
(EngineCore_DP0 pid=153)   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 34, in <module>
(EngineCore_DP0 pid=153)     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 82, in <module>
(EngineCore_DP0 pid=153)     from vllm.v1.spec_decode.eagle import EagleProposer
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 23, in <module>
(EngineCore_DP0 pid=153)     from vllm.v1.attention.backends.flash_attn import FlashAttentionMetadata
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/attention/backends/flash_attn.py", line 155, in <module>
(EngineCore_DP0 pid=153)     class FlashAttentionMetadataBuilder(
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/attention/backends/flash_attn.py", line 176, in FlashAttentionMetadataBuilder
(EngineCore_DP0 pid=153)     if get_flash_attn_version() == 3 else AttentionCGSupport.UNIFORM_BATCH
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/attention/utils/fa_utils.py", line 56, in get_flash_attn_version
(EngineCore_DP0 pid=153)     if not is_fa_version_supported(fa_version):
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 55, in is_fa_version_supported
(EngineCore_DP0 pid=153)     return _is_fa2_supported(device)[0]
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 35, in _is_fa2_supported
(EngineCore_DP0 pid=153)     if torch.cuda.get_device_capability(device)[0] < 8:
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 600, in get_device_capability
(EngineCore_DP0 pid=153)     prop = get_device_properties(device)
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 616, in get_device_properties
(EngineCore_DP0 pid=153)     _lazy_init()  # will define _get_device_properties
(EngineCore_DP0 pid=153)   File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 412, in _lazy_init
(EngineCore_DP0 pid=153)     torch._C._cuda_init()
(EngineCore_DP0 pid=153) RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/paddlex_genai_server", line 8, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/paddlex/inference/genai/server.py", line 113, in main
(APIServer pid=1)     run_genai_server(args)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/paddlex/inference/genai/server.py", line 100, in run_genai_server
(APIServer pid=1)     run_server_func(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/paddlex/inference/genai/backends/vllm.py", line 68, in run_vllm_server
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=1)     return loop.run_until_complete(wrapper())
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1941, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)   File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)   File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/utils/__init__.py", line 1589, in inner
(APIServer pid=1)     return fn(*args, **kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 212, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 136, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=1)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=1)   File "/usr/local/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Thank you !
xne0n