AI & ML interests
Google ❤️ Open Source AI
Recent Activity
	View all activity
	
				Papers
		
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
					Articles
				
			Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications.
			
	
	Collection of open models to accelerate the development of therapeutics.
			
	
	- 
	
	
	52Compare Siglip1 Siglip2🚀Compare SigLIP1 and SigLIP2 on zero shot classification 
- 
	
	
	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense FeaturesPaper • 2502.14786 • Published • 152
- 
	
	
	  google/siglip2-base-patch16-224Zero-Shot Image Classification • 0.4B • Updated • 568k • 72
- 
	
	
	  google/siglip2-base-patch16-256Zero-Shot Image Classification • 0.4B • Updated • 52.6k • 6
Vision-Language Models available in multiple 3B, 10B and 28B variants.
			
	
	- 
	
	
	PaliGemma 2: A Family of Versatile VLMs for TransferPaper • 2412.03555 • Published • 133
- 
	
	
	  google/paligemma2-3b-pt-224Image-Text-to-Text • 3B • Updated • 1.21M • 159
- 
	
	
	  google/paligemma2-3b-pt-448Image-Text-to-Text • 3B • Updated • 4.45k • 46
- 
	
	
	  google/paligemma2-3b-pt-896Image-Text-to-Text • 3B • Updated • 1.68k • 22
A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/)
			
	
	Groups the Gemma models released by the Google team. 
			
	
	A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B.
			
	
	The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".
			
	
	The Flan-T5 covers 4 checkpoints of different sizes each time.  It also includes upgrades versions trained using Universal sampling 
			
	
	The MT5 release follows the T5 family, but is pretrained on multilingual data. The update UMT5 models are pretrained on an updated corpus.
			
	
	This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. 
			
	
	Datasets released in "IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs" (https://arxiv.org/abs/2404.16816)
			
	
	A series of pioneering open models that help ground LLMs in real-world data through Data Commons. 
			
	
	TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
			
	
	- 
	
	
	  google/timesfm-1.0-200mTime Series Forecasting • Updated • 323 • 773
- 
	
	
	  google/timesfm-1.0-200m-pytorchTime Series Forecasting • Updated • 2.83k • 29
- 
	
	
	  google/timesfm-2.0-500m-jaxTime Series Forecasting • Updated • 191 • 16
- 
	
	
	  google/timesfm-2.0-500m-pytorchTime Series Forecasting • 0.5B • Updated • 11.2k • 226
Collection of concept apps built built with MedGemma models to inspire the community.
			
	
	VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. 
			
	
	- 
	
	
	VideoPrism: A Foundational Visual Encoder for Video UnderstandingPaper • 2402.13217 • Published • 37
- 
	
	
	  google/videoprism-base-f16r288Video Classification • Updated • 76.8k • 88
- 
	
	
	  google/videoprism-large-f8r288Video Classification • Updated • 244 • 17
- 
	
	
	  google/videoprism-lvt-base-f16r288Video Classification • Updated • 13.4k • 9
Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def`
			
	
	- 
	
	
	33Path Foundation Demo🔬Browse pathology images for analysis 
- 
	
	
	20CXR Foundation Demo🩻Demo usage of the CXR Foundation model embeddings 
- 
	
	
	202MedGemma - Radiology Explainer Demo🩺Radiology Image & Report Explainer Demo. Built with MedGemma 
- 
	
	
	139Appoint Ready - MedGemma Demo📋Simulated Pre-visit Intake Demo built using MedGemma 
Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory
			
	
	- 
	
	
	  google/gemma-3-4b-it-qat-q4_0-ggufImage-Text-to-Text • 4B • Updated • 3.59k • 204
- 
	
	
	  google/gemma-3-4b-pt-qat-q4_0-ggufImage-Text-to-Text • 4B • Updated • 99 • 23
- 
	
	
	  google/gemma-3-1b-it-qat-q4_0-ggufText Generation • 1.0B • Updated • 1.75k • 92
- 
	
	
	  google/gemma-3-1b-pt-qat-q4_0-ggufText Generation • 1.0B • Updated • 71 • 12
ShieldGemma is a family of models for text and image content moderation.
			
	
	A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/)
			
	
	Groups models released for use in health AI by Google. Read more about HAI-DEF at https://developers.google.com/health-ai-developer-foundations
			
	
	Pretrained and mix checkpoints for PaliGemma
			
	
	The 2.6B parameter version of Gemma 2.
			
	
	A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications.
			
	
	Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English.
			
	
	This collection regroups the ELECTRA models released by the Google team.
			
	
	The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1
			
	
	- 
	
	
	  google-t5/t5-baseTranslation • 0.2B • Updated • 1.35M • • 755
- 
	
	
	  google-t5/t5-smallTranslation • 60.5M • Updated • 3.03M • • 500
- 
	
	
	  google-t5/t5-largeTranslation • 0.7B • Updated • 239k • • 224
- 
	
	
	Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerPaper • 1910.10683 • Published • 14
The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194).
			
	
	Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343
			
	
	- 
	
	
	  google/siglip-so400m-patch14-384Zero-Shot Image Classification • 0.9B • Updated • 2.31M • 610
- 
	
	
	  google/siglip-so400m-patch14-224Zero-Shot Image Classification • 0.9B • Updated • 73.5k • 54
- 
	
	
	  google/siglip-so400m-patch16-256-i18nZero-Shot Image Classification • 1B • Updated • 1.37k • 30
- 
	
	
	  google/siglip-base-patch16-256-multilingualZero-Shot Image Classification • 0.4B • Updated • 15.8k • 50
arXiv: https://arxiv.org/abs/2405.02793
			
	
	Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data.
			
	
	A Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language the same level of performance of EN only queries on Gemma 2.
			
	
	Collection of concept apps built built with MedGemma models to inspire the community.
			
	
	VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. 
			
	
	- 
	
	
	VideoPrism: A Foundational Visual Encoder for Video UnderstandingPaper • 2402.13217 • Published • 37
- 
	
	
	  google/videoprism-base-f16r288Video Classification • Updated • 76.8k • 88
- 
	
	
	  google/videoprism-large-f8r288Video Classification • Updated • 244 • 17
- 
	
	
	  google/videoprism-lvt-base-f16r288Video Classification • Updated • 13.4k • 9
Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def`
			
	
	- 
	
	
	33Path Foundation Demo🔬Browse pathology images for analysis 
- 
	
	
	20CXR Foundation Demo🩻Demo usage of the CXR Foundation model embeddings 
- 
	
	
	202MedGemma - Radiology Explainer Demo🩺Radiology Image & Report Explainer Demo. Built with MedGemma 
- 
	
	
	139Appoint Ready - MedGemma Demo📋Simulated Pre-visit Intake Demo built using MedGemma 
Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications.
			
	
	Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory
			
	
	- 
	
	
	  google/gemma-3-4b-it-qat-q4_0-ggufImage-Text-to-Text • 4B • Updated • 3.59k • 204
- 
	
	
	  google/gemma-3-4b-pt-qat-q4_0-ggufImage-Text-to-Text • 4B • Updated • 99 • 23
- 
	
	
	  google/gemma-3-1b-it-qat-q4_0-ggufText Generation • 1.0B • Updated • 1.75k • 92
- 
	
	
	  google/gemma-3-1b-pt-qat-q4_0-ggufText Generation • 1.0B • Updated • 71 • 12
Collection of open models to accelerate the development of therapeutics.
			
	
	ShieldGemma is a family of models for text and image content moderation.
			
	
	- 
	
	
	52Compare Siglip1 Siglip2🚀Compare SigLIP1 and SigLIP2 on zero shot classification 
- 
	
	
	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense FeaturesPaper • 2502.14786 • Published • 152
- 
	
	
	  google/siglip2-base-patch16-224Zero-Shot Image Classification • 0.4B • Updated • 568k • 72
- 
	
	
	  google/siglip2-base-patch16-256Zero-Shot Image Classification • 0.4B • Updated • 52.6k • 6
Vision-Language Models available in multiple 3B, 10B and 28B variants.
			
	
	- 
	
	
	PaliGemma 2: A Family of Versatile VLMs for TransferPaper • 2412.03555 • Published • 133
- 
	
	
	  google/paligemma2-3b-pt-224Image-Text-to-Text • 3B • Updated • 1.21M • 159
- 
	
	
	  google/paligemma2-3b-pt-448Image-Text-to-Text • 3B • Updated • 4.45k • 46
- 
	
	
	  google/paligemma2-3b-pt-896Image-Text-to-Text • 3B • Updated • 1.68k • 22
A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/)
			
	
	A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/)
			
	
	Groups models released for use in health AI by Google. Read more about HAI-DEF at https://developers.google.com/health-ai-developer-foundations
			
	
	Pretrained and mix checkpoints for PaliGemma
			
	
	The 2.6B parameter version of Gemma 2.
			
	
	Groups the Gemma models released by the Google team. 
			
	
	A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications.
			
	
	A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B.
			
	
	Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English.
			
	
	The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".
			
	
	This collection regroups the ELECTRA models released by the Google team.
			
	
	The Flan-T5 covers 4 checkpoints of different sizes each time.  It also includes upgrades versions trained using Universal sampling 
			
	
	The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1
			
	
	- 
	
	
	  google-t5/t5-baseTranslation • 0.2B • Updated • 1.35M • • 755
- 
	
	
	  google-t5/t5-smallTranslation • 60.5M • Updated • 3.03M • • 500
- 
	
	
	  google-t5/t5-largeTranslation • 0.7B • Updated • 239k • • 224
- 
	
	
	Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerPaper • 1910.10683 • Published • 14
The MT5 release follows the T5 family, but is pretrained on multilingual data. The update UMT5 models are pretrained on an updated corpus.
			
	
	The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194).
			
	
	This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. 
			
	
	Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343
			
	
	- 
	
	
	  google/siglip-so400m-patch14-384Zero-Shot Image Classification • 0.9B • Updated • 2.31M • 610
- 
	
	
	  google/siglip-so400m-patch14-224Zero-Shot Image Classification • 0.9B • Updated • 73.5k • 54
- 
	
	
	  google/siglip-so400m-patch16-256-i18nZero-Shot Image Classification • 1B • Updated • 1.37k • 30
- 
	
	
	  google/siglip-base-patch16-256-multilingualZero-Shot Image Classification • 0.4B • Updated • 15.8k • 50
Datasets released in "IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs" (https://arxiv.org/abs/2404.16816)
			
	
	arXiv: https://arxiv.org/abs/2405.02793
			
	
	A series of pioneering open models that help ground LLMs in real-world data through Data Commons. 
			
	
	Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data.
			
	
	TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
			
	
	- 
	
	
	  google/timesfm-1.0-200mTime Series Forecasting • Updated • 323 • 773
- 
	
	
	  google/timesfm-1.0-200m-pytorchTime Series Forecasting • Updated • 2.83k • 29
- 
	
	
	  google/timesfm-2.0-500m-jaxTime Series Forecasting • Updated • 191 • 16
- 
	
	
	  google/timesfm-2.0-500m-pytorchTime Series Forecasting • 0.5B • Updated • 11.2k • 226
A Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language the same level of performance of EN only queries on Gemma 2.
			
	
	 
				