Errors when using transformers 4.57.1, Load model by AutoModel

by heyanzhuo - opened 27 days ago

27 days ago

•

Hi there, guys, thanks for the work!

Inspired by the works, I'm trying to make some tests.
When loading the model with "AutoModel.from_pretrained", several error pops up. (see below)
I could make some tweaks thus make it work.
But, am I using the wrong transformer version, or do you have some customized code?

errors like

qwen-vl-utils using decord to read video.
Traceback (most recent call last):
  File "/home/reolink/model_zoo/qwen_gve.py", line 43, in <module>
    outputs = model(**inputs,)
              ^^^^^^^^^^^^^^^^
  File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reolink/.cache/huggingface/modules/transformers_modules/GVE_hyphen_3B/modeling_gve.py", line 197, in forward
    inputs_embeds = self.model.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reolink/model_zoo/ComfyUI/env_run/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
    raise AttributeError(
AttributeError: 'Qwen2_5_VLModel' object has no attribute 'embed_tokens'

mod like

diff --git a/modeling_gve.py b/modeling_gve.py
index e1c627e..4034cc8 100644
--- a/modeling_gve.py
+++ b/modeling_gve.py
@@ -23,7 +23,82 @@ from transformers.utils import (
     replace_return_docstrings,
 )
 from transformers.models.qwen2_5_vl.configuration_qwen2_5_vl import Qwen2_5_VLConfig, Qwen2_5_VLVisionConfig
-from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import Qwen2_5_VLForConditionalGeneration, QWEN2_5_VL_INPUTS_DOCSTRING, Qwen2_5_VLCausalLMOutputWithPast
+from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import Qwen2_5_VLForConditionalGeneration, Qwen2_5_VLCausalLMOutputWithPast
+
+QWEN2_5_VL_INPUTS_DOCSTRING = r""

@@ -119,7 +194,7 @@ class Qwen25VLForEmbedding(Qwen2_5_VLForConditionalGeneration):
         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
 
         if inputs_embeds is None:
-            inputs_embeds = self.model.embed_tokens(input_ids)
+            inputs_embeds = self.model.language_model.embed_tokens(input_ids)
             if pixel_values is not None:
                 pixel_values = pixel_values.type(self.visual.dtype)
                 image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
@@ -162,8 +237,8 @@ class Qwen25VLForEmbedding(Qwen2_5_VLForConditionalGeneration):
         # if we get 4D attention mask we cannot calculate rope deltas anymore. TODO 

@raushan
	 fixme
         if position_ids is None and (attention_mask is None or attention_mask.ndim == 2):
             # calculate RoPE index once per generation in the pre-fill stage only
-            if (cache_position is not None and cache_position[0] == 0) or self.rope_deltas is None:
-                position_ids, rope_deltas = self.get_rope_index(
+            if (cache_position is not None and cache_position[0] == 0) or self.model.rope_deltas is None:
+                position_ids, rope_deltas = self.model.get_rope_index(
                     input_ids,
                     image_grid_thw,

Is that necessary to make it work on transformers 4.57.1? Or I did something wrong?

Zhuoning

Alibaba-NLP org 26 days ago

Thanks for your interest! As the config.json preserves, our version of transformers is 4.50.3. The main difference is from modeling_gve.py, so that if you want to use the newest version of transformers for Qwen2.5-VL, you can adapt the corresponding code for modeling in that file. Hope these helpful for you.

heyanzhuo

26 days ago

Thanks for your interest! As the config.json preserves, our version of transformers is 4.50.3. The main difference is from modeling_gve.py, so that if you want to use the newest version of transformers for Qwen2.5-VL, you can adapt the corresponding code for modeling in that file. Hope these helpful for you.

Ah, I see, it was my fault, and sorry for asking a silly question, thanks for your explanation!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment