Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -1,8 +1,16 @@
1
  ---
2
  tags:
3
  - notebook
4
- pipeline_tag: image-text-to-text
5
  library_name: transformers
 
 
 
 
 
 
 
 
 
6
  ---
7
  ![Smol](https://github.com/merveenoyan/smol-vision/assets/53175384/930d5b36-bb9d-4ab6-8b5a-4fec28c48f80)
8
  # Smol Vision 🐣
@@ -31,4 +39,4 @@ Latest examples πŸ‘‡πŸ»
31
  | VLM Fine-tuning | [Fine-tune Gemma-3n for all modalities (audio-text-image)](https://huggingface.co/merve/smol-vision/blob/main/Gemma3n_Fine_tuning_on_All_Modalities.ipynb) | Fine-tune Gemma-3n model to handle any modality: audio, text, and image. |
32
  | Multimodal RAG | [Any-to-Any (Video) RAG with OmniEmbed and Qwen](https://huggingface.co/merve/smol-vision/blob/main/Any_to_Any_RAG.ipynb) | Do retrieval and generation across modalities (including video) using OmniEmbed and Qwen. |
33
  | Speed-up/Memory Optimization | Vision language model serving using TGI (SOON) | Explore speed-ups and memory improvements for vision-language model serving with text-generation inference |
34
- | Quantization/Optimum/ORT | All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON) | End-to-end model optimization using Optimum |
 
1
  ---
2
  tags:
3
  - notebook
 
4
  library_name: transformers
5
+ base_model:
6
+ - black-forest-labs/FLUX.1-Kontext-dev
7
+ - google/gemma-3n-E4B-it
8
+ - mistralai/Voxtral-Mini-3B-2507
9
+ - Qwen/Qwen3-Coder-480B-A35B-Instruct
10
+ - black-forest-labs/FLUX.1-Kontext-dev-onnx
11
+ - moonshotai/Kimi-K2-Instruct
12
+ - tencent/Hunyuan-A13B-Instruct
13
+ new_version: merve/smol-vision
14
  ---
15
  ![Smol](https://github.com/merveenoyan/smol-vision/assets/53175384/930d5b36-bb9d-4ab6-8b5a-4fec28c48f80)
16
  # Smol Vision 🐣
 
39
  | VLM Fine-tuning | [Fine-tune Gemma-3n for all modalities (audio-text-image)](https://huggingface.co/merve/smol-vision/blob/main/Gemma3n_Fine_tuning_on_All_Modalities.ipynb) | Fine-tune Gemma-3n model to handle any modality: audio, text, and image. |
40
  | Multimodal RAG | [Any-to-Any (Video) RAG with OmniEmbed and Qwen](https://huggingface.co/merve/smol-vision/blob/main/Any_to_Any_RAG.ipynb) | Do retrieval and generation across modalities (including video) using OmniEmbed and Qwen. |
41
  | Speed-up/Memory Optimization | Vision language model serving using TGI (SOON) | Explore speed-ups and memory improvements for vision-language model serving with text-generation inference |
42
+ | Quantization/Optimum/ORT | All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON) | End-to-end model optimization using Optimum |