Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.21045

about 14 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Paper • 2410.09604 • Published Oct 12, 2024
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6 • 10
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

Urban Spatial Intelligence

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7
CityGPT: Empowering Urban Spatial Cognition of Large Language Models

Paper • 2406.13948 • Published Jun 20, 2024 • 1
CityBench: Evaluating the Capabilities of Large Language Model as World Model

Paper • 2406.13945 • Published Jun 20, 2024 • 1
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science

Paper • 2504.09848 • Published Apr 14

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Paper • 2503.10437 • Published Mar 13 • 32
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Paper • 2503.09642 • Published Mar 12 • 19
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14 • 32
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

Paper • 2503.16422 • Published Mar 20 • 14

about 14 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Urban Spatial Intelligence

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7
CityGPT: Empowering Urban Spatial Cognition of Large Language Models

Paper • 2406.13948 • Published Jun 20, 2024 • 1
CityBench: Evaluating the Capabilities of Large Language Model as World Model

Paper • 2406.13945 • Published Jun 20, 2024 • 1
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science

Paper • 2504.09848 • Published Apr 14

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Paper • 2410.09604 • Published Oct 12, 2024
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6 • 10
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Paper • 2503.10437 • Published Mar 13 • 32
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Paper • 2503.09642 • Published Mar 12 • 19
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14 • 32
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

Paper • 2503.16422 • Published Mar 20 • 14

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs