EXISTING MODELS !

#1
by LeroyDyer - opened

I will request that your docs are upgraded !

with the llava model they show you how to generate the VisionModel Config ! which is a vital part of creating a vision model :

there are some of us which have heavily trained our mistral llm models and would like to combine the functionlity of image and text to text !
and so we could be instanciating a new pixtral model or ministral model :


from transformers import Mistral3ForConditionalGeneration, Mistral3Config, PixtralVisionConfig, MistralConfig

# Initializing a Pixtral-vision config
vision_config = PixtralVisionConfig()

# Initializing a Mistral config
text_config = MistralConfig()

# Initializing a Mistral3 configuration
configuration = Mistral3Config(vision_config, text_config)

# Initializing a model from the mistral3.1 configuration
model = Mistral3ForConditionalGeneration(configuration)

# Accessing the model configuration
configuration = model.config

so we can insert our fine tuned llm as a mistralconfig ... n problems:

but the problem lays here ?

# Initializing a Pixtral-vision config
vision_config = PixtralVisionConfig()

on searching the notes this section is indeed left out how to instanciate a PixtralVisionConfig!!
but on the llava page they have how to do this as it take an existing clip model and creates the config from this :

from transformers import LlavaForConditionalGeneration, LlavaConfig, CLIPVisionConfig, LlamaConfig

# Initializing a CLIP-vision config
vision_config = CLIPVisionConfig()

# Initializing a Llama config
text_config = LlamaConfig()

# Initializing a Llava llava-1.5-7b style configuration
configuration = LlavaConfig(vision_config, text_config)

# Initializing a model from the llava-1.5-7b style configuration
model = LlavaForConditionalGeneration(configuration)

# Accessing the model configuration
configuration = model.config


hence πŸ˜€


# Initializing a CLIP-vision config
vision_config = CLIPVisionConfig()

so:


from transformers import CLIPVisionConfig, CLIPVisionModel

# Initializing a CLIPVisionConfig with openai/clip-vit-base-patch32 style configuration
configuration = CLIPVisionConfig()

# Initializing a CLIPVisionModel (with random weights) from the openai/clip-vit-base-patch32 style configuration
model = CLIPVisionModel(configuration)

# Accessing the model configuration
configuration = model.config

are we going in circles for a simple thing ??

so please release the code for the viion config instaciation !

Sign up or log in to comment