Update README.md
#4
by
						
Rafacelis
	
							
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -1,8 +1,16 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            tags:
         | 
| 3 | 
             
            - notebook
         | 
| 4 | 
            -
            pipeline_tag: image-text-to-text
         | 
| 5 | 
             
            library_name: transformers
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 6 | 
             
            ---
         | 
| 7 | 
             
            
         | 
| 8 | 
             
            # Smol Vision π£
         | 
| @@ -31,4 +39,4 @@ Latest examples ππ» | |
| 31 | 
             
            | VLM Fine-tuning             | [Fine-tune Gemma-3n for all modalities (audio-text-image)](https://huggingface.co/merve/smol-vision/blob/main/Gemma3n_Fine_tuning_on_All_Modalities.ipynb)            | Fine-tune Gemma-3n model to handle any modality: audio, text, and image.                                           |
         | 
| 32 | 
             
            | Multimodal RAG              | [Any-to-Any (Video) RAG with OmniEmbed and Qwen](https://huggingface.co/merve/smol-vision/blob/main/Any_to_Any_RAG.ipynb)                                             | Do retrieval and generation across modalities (including video) using OmniEmbed and Qwen.                          |
         | 
| 33 | 
             
            | Speed-up/Memory Optimization | Vision language model serving using TGI (SOON)                                                                                                                          | Explore speed-ups and memory improvements for vision-language model serving with text-generation inference |
         | 
| 34 | 
            -
            | Quantization/Optimum/ORT     | All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON)                                                                          | End-to-end model optimization using Optimum                                                                |
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            tags:
         | 
| 3 | 
             
            - notebook
         | 
|  | |
| 4 | 
             
            library_name: transformers
         | 
| 5 | 
            +
            base_model:
         | 
| 6 | 
            +
            - black-forest-labs/FLUX.1-Kontext-dev
         | 
| 7 | 
            +
            - google/gemma-3n-E4B-it
         | 
| 8 | 
            +
            - mistralai/Voxtral-Mini-3B-2507
         | 
| 9 | 
            +
            - Qwen/Qwen3-Coder-480B-A35B-Instruct
         | 
| 10 | 
            +
            - black-forest-labs/FLUX.1-Kontext-dev-onnx
         | 
| 11 | 
            +
            - moonshotai/Kimi-K2-Instruct
         | 
| 12 | 
            +
            - tencent/Hunyuan-A13B-Instruct
         | 
| 13 | 
            +
            new_version: merve/smol-vision
         | 
| 14 | 
             
            ---
         | 
| 15 | 
             
            
         | 
| 16 | 
             
            # Smol Vision π£
         | 
|  | |
| 39 | 
             
            | VLM Fine-tuning             | [Fine-tune Gemma-3n for all modalities (audio-text-image)](https://huggingface.co/merve/smol-vision/blob/main/Gemma3n_Fine_tuning_on_All_Modalities.ipynb)            | Fine-tune Gemma-3n model to handle any modality: audio, text, and image.                                           |
         | 
| 40 | 
             
            | Multimodal RAG              | [Any-to-Any (Video) RAG with OmniEmbed and Qwen](https://huggingface.co/merve/smol-vision/blob/main/Any_to_Any_RAG.ipynb)                                             | Do retrieval and generation across modalities (including video) using OmniEmbed and Qwen.                          |
         | 
| 41 | 
             
            | Speed-up/Memory Optimization | Vision language model serving using TGI (SOON)                                                                                                                          | Explore speed-ups and memory improvements for vision-language model serving with text-generation inference |
         | 
| 42 | 
            +
            | Quantization/Optimum/ORT     | All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON)                                                                          | End-to-end model optimization using Optimum                                                                |
         | 
