MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19 • 52
LangNav: Language as a Perceptual Representation for Navigation Paper • 2310.07889 • Published Oct 11, 2023 • 6