-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2409.06820
-
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
Paper • 2409.05556 • Published • 2 -
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Paper • 2409.04109 • Published • 48 -
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Paper • 2409.15277 • Published • 38 -
Learning Task Decomposition to Assist Humans in Competitive Programming
Paper • 2406.04604 • Published • 4
-
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Paper • 2401.01275 • Published • 1 -
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Paper • 2404.12241 • Published • 13 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Paper • 2406.12624 • Published • 37
-
Role-Playing Evaluation for Large Language Models
Paper • 2505.13157 • Published • 6 -
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Paper • 2505.23923 • Published • 8 -
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Paper • 2409.06820 • Published • 68 -
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper • 2502.09082 • Published • 30
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 37 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Role-Playing Evaluation for Large Language Models
Paper • 2505.13157 • Published • 6 -
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Paper • 2505.23923 • Published • 8 -
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Paper • 2409.06820 • Published • 68 -
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper • 2502.09082 • Published • 30
-
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
Paper • 2409.05556 • Published • 2 -
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Paper • 2409.04109 • Published • 48 -
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Paper • 2409.15277 • Published • 38 -
Learning Task Decomposition to Assist Humans in Competitive Programming
Paper • 2406.04604 • Published • 4
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 37 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40
-
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Paper • 2401.01275 • Published • 1 -
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Paper • 2404.12241 • Published • 13 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Paper • 2406.12624 • Published • 37