EXISTING MODELS !
I will request that your docs are upgraded !
with the llava model they show you how to generate the VisionModel Config ! which is a vital part of creating a vision model :
there are some of us which have heavily trained our mistral llm models and would like to combine the functionlity of image and text to text !
and so we could be instanciating a new pixtral model or ministral model :
from transformers import Mistral3ForConditionalGeneration, Mistral3Config, PixtralVisionConfig, MistralConfig
# Initializing a Pixtral-vision config
vision_config = PixtralVisionConfig()
# Initializing a Mistral config
text_config = MistralConfig()
# Initializing a Mistral3 configuration
configuration = Mistral3Config(vision_config, text_config)
# Initializing a model from the mistral3.1 configuration
model = Mistral3ForConditionalGeneration(configuration)
# Accessing the model configuration
configuration = model.config
so we can insert our fine tuned llm as a mistralconfig ... n problems:
but the problem lays here ?
# Initializing a Pixtral-vision config
vision_config = PixtralVisionConfig()
on searching the notes this section is indeed left out how to instanciate a PixtralVisionConfig!!
but on the llava page they have how to do this as it take an existing clip model and creates the config from this :
from transformers import LlavaForConditionalGeneration, LlavaConfig, CLIPVisionConfig, LlamaConfig
# Initializing a CLIP-vision config
vision_config = CLIPVisionConfig()
# Initializing a Llama config
text_config = LlamaConfig()
# Initializing a Llava llava-1.5-7b style configuration
configuration = LlavaConfig(vision_config, text_config)
# Initializing a model from the llava-1.5-7b style configuration
model = LlavaForConditionalGeneration(configuration)
# Accessing the model configuration
configuration = model.config
hence π
# Initializing a CLIP-vision config
vision_config = CLIPVisionConfig()
so:
from transformers import CLIPVisionConfig, CLIPVisionModel
# Initializing a CLIPVisionConfig with openai/clip-vit-base-patch32 style configuration
configuration = CLIPVisionConfig()
# Initializing a CLIPVisionModel (with random weights) from the openai/clip-vit-base-patch32 style configuration
model = CLIPVisionModel(configuration)
# Accessing the model configuration
configuration = model.config
are we going in circles for a simple thing ??
so please release the code for the viion config instaciation !