Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7, 2024 • 25
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 36
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16, 2024 • 57
H2O Open Ecosystem for State-of-the-art Large Language Models Paper • 2310.13012 • Published Oct 17, 2023 • 9
H2OVL-Mississippi Vision Language Models Technical Report Paper • 2410.13611 • Published Oct 17, 2024 • 1
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing Paper • 2505.08651 • Published May 13 • 1
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding Paper • 2510.05788 • Published 28 days ago • 1
FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content Paper • 2308.14256 • Published Aug 28, 2023 • 2
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese Paper • 2401.16640 • Published Jan 30, 2024 • 10
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model Paper • 2309.11568 • Published Sep 20, 2023 • 11
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 19
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published May 2, 2024 • 31