Errors when using transformers 4.57.1, Load model by AutoModel
Hi there, guys, thanks for the work!
Inspired by the works, I'm trying to make some tests.
When loading the model with "AutoModel.from_pretrained", several error pops up. (see below)
I could make some tweaks thus make it work.
But, am I using the wrong transformer version, or do you have some customized code?
errors like
qwen-vl-utils using decord to read video.
Traceback (most recent call last):
File "/home/reolink/model_zoo/qwen_gve.py", line 43, in <module>
outputs = model(**inputs,)
^^^^^^^^^^^^^^^^
File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reolink/.cache/huggingface/modules/transformers_modules/GVE_hyphen_3B/modeling_gve.py", line 197, in forward
inputs_embeds = self.model.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
raise AttributeError(
AttributeError: 'Qwen2_5_VLModel' object has no attribute 'embed_tokens'
mod like
diff --git a/modeling_gve.py b/modeling_gve.py
index e1c627e..4034cc8 100644
--- a/modeling_gve.py
+++ b/modeling_gve.py
@@ -23,7 +23,82 @@ from transformers.utils import (
replace_return_docstrings,
)
from transformers.models.qwen2_5_vl.configuration_qwen2_5_vl import Qwen2_5_VLConfig, Qwen2_5_VLVisionConfig
-from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import Qwen2_5_VLForConditionalGeneration, QWEN2_5_VL_INPUTS_DOCSTRING, Qwen2_5_VLCausalLMOutputWithPast
+from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import Qwen2_5_VLForConditionalGeneration, Qwen2_5_VLCausalLMOutputWithPast
+
+QWEN2_5_VL_INPUTS_DOCSTRING = r""
@@ -119,7 +194,7 @@ class Qwen25VLForEmbedding(Qwen2_5_VLForConditionalGeneration):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if inputs_embeds is None:
- inputs_embeds = self.model.embed_tokens(input_ids)
+ inputs_embeds = self.model.language_model.embed_tokens(input_ids)
if pixel_values is not None:
pixel_values = pixel_values.type(self.visual.dtype)
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
@@ -162,8 +237,8 @@ class Qwen25VLForEmbedding(Qwen2_5_VLForConditionalGeneration):
# if we get 4D attention mask we cannot calculate rope deltas anymore. TODO
@raushan
fixme
if position_ids is None and (attention_mask is None or attention_mask.ndim == 2):
# calculate RoPE index once per generation in the pre-fill stage only
- if (cache_position is not None and cache_position[0] == 0) or self.rope_deltas is None:
- position_ids, rope_deltas = self.get_rope_index(
+ if (cache_position is not None and cache_position[0] == 0) or self.model.rope_deltas is None:
+ position_ids, rope_deltas = self.model.get_rope_index(
input_ids,
image_grid_thw,
Is that necessary to make it work on transformers 4.57.1? Or I did something wrong?
Thanks for your interest! As the config.json preserves, our version of transformers is 4.50.3. The main difference is from modeling_gve.py, so that if you want to use the newest version of transformers for Qwen2.5-VL, you can adapt the corresponding code for modeling in that file. Hope these helpful for you.
Thanks for your interest! As the
config.jsonpreserves, our version oftransformersis 4.50.3. The main difference is frommodeling_gve.py, so that if you want to use the newest version of transformers for Qwen2.5-VL, you can adapt the corresponding code for modeling in that file. Hope these helpful for you.
Ah, I see, it was my fault, and sorry for asking a silly question, thanks for your explanation!