johannhartmann
's Collections
Document & UI Intelligence
updated
8B
•
Updated
•
45
•
9
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
•
2412.04454
•
Published
•
71
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper
•
2401.10935
•
Published
•
5
Text Generation
•
10B
•
Updated
•
245
•
18
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text
•
8B
•
Updated
•
216
•
68
Ferret-UI 2: Mastering Universal User Interface Understanding Across
Platforms
Paper
•
2410.18967
•
Published
•
1
Image-Text-to-Text
•
Updated
•
339
•
1.7k
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
•
2501.04575
•
Published
•
25
Updated
•
2.5k
•
269
Image-Text-to-Text
•
0.3B
•
Updated
•
607
•
98