From Masks to Worlds: A Hitchhiker's Guide to World Models
Abstract
The guide outlines a progression from early masked models to memory-augmented systems, emphasizing generative capabilities, interactive loops, and memory for building world models.
This is not a typical survey of world models; it is a guide for those who want to build worlds. We do not aim to catalog every paper that has ever mentioned a ``world model". Instead, we follow one clear road: from early masked models that unified representation learning across modalities, to unified architectures that share a single paradigm, then to interactive generative models that close the action-perception loop, and finally to memory-augmented systems that sustain consistent worlds over time. We bypass loosely related branches to focus on the core: the generative heart, the interactive loop, and the memory system. We show that this is the most promising path towards true world models.
Community
A Hitchhiker’s guide for those who want to build worlds. We follow one clear road: from early masked models, to unified architectures that share a single paradigm, then to interactive generative models, and finally to memory-augmented systems that sustain consistent worlds over time.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A Comprehensive Survey on World Models for Embodied AI (2025)
- WoW: Towards a World omniscient World model Through Embodied Interaction (2025)
- F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions (2025)
- Can World Models Benefit VLMs for World Dynamics? (2025)
- World-in-World: World Models in a Closed-Loop World (2025)
- dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought (2025)
- DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 Jinbin Bai
							Jinbin Bai 
					 
					