dhuynh95
			's Collections
			 
		
			
		cool-papers
		
	updated
			
 
				
				
	
	
	
			
			Unlocking the conversion of Web Screenshots into HTML Code with the
  WebSight Dataset
		
			Paper
			
•
			2403.09029
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic
  Prompt Compression
		
			Paper
			
•
			2403.12968
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			RAFT: Adapting Language Model to Domain Specific RAG
		
			Paper
			
•
			2403.10131
			
•
			Published
				
			•
				
				72
			
 
	
	 
	
	
	
			
			Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
		
			Paper
			
•
			2403.09629
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
		
			Paper
			
•
			2403.08763
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			Language models scale reliably with over-training and on downstream
  tasks
		
			Paper
			
•
			2403.08540
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Algorithmic progress in language models
		
			Paper
			
•
			2403.05812
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Gemini 1.5: Unlocking multimodal understanding across millions of tokens
  of context
		
			Paper
			
•
			2403.05530
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			TnT-LLM: Text Mining at Scale with Large Language Models
		
			Paper
			
•
			2403.12173
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Larimar: Large Language Models with Episodic Memory Control
		
			Paper
			
•
			2403.11901
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Reverse Training to Nurse the Reversal Curse
		
			Paper
			
•
			2403.13799
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			When Do We Not Need Larger Vision Models?
		
			Paper
			
•
			2403.13043
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
		
			Paper
			
•
			2403.14624
			
•
			Published
				
			•
				
				53
			
 
	
	 
	
	
	
			
			FollowIR: Evaluating and Teaching Information Retrieval Models to Follow
  Instructions
		
			Paper
			
•
			2403.15246
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Can large language models explore in-context?
		
			Paper
			
•
			2403.15371
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			The Unreasonable Ineffectiveness of the Deeper Layers
		
			Paper
			
•
			2403.17887
			
•
			Published
				
			•
				
				82
			
 
	
	 
	
	
	
			
			Long-form factuality in large language models
		
			Paper
			
•
			2403.18802
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
		
			Paper
			
•
			2403.18421
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Octopus v2: On-device language model for super agent
		
			Paper
			
•
			2404.01744
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			Poro 34B and the Blessing of Multilinguality
		
			Paper
			
•
			2404.01856
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Long-context LLMs Struggle with Long In-context Learning
		
			Paper
			
•
			2404.02060
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Training LLMs over Neurally Compressed Text
		
			Paper
			
•
			2404.03626
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			CodeEditorBench: Evaluating Code Editing Capability of Large Language
  Models
		
			Paper
			
•
			2404.03543
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Language Models as Compilers: Simulating Pseudocode Execution Improves
  Algorithmic Reasoning in Language Models
		
			Paper
			
•
			2404.02575
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing
		
			Paper
			
•
			2404.12253
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			Compression Represents Intelligence Linearly
		
			Paper
			
•
			2404.09937
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Flamingo: a Visual Language Model for Few-Shot Learning
		
			Paper
			
•
			2204.14198
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Executable Code Actions Elicit Better LLM Agents
		
			Paper
			
•
			2402.01030
			
•
			Published
				
			•
				
				172
			
 
	
	 
	
	
	
			
			Buffer of Thoughts: Thought-Augmented Reasoning with Large Language
  Models
		
			Paper
			
•
			2406.04271
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			To Believe or Not to Believe Your LLM
		
			Paper
			
•
			2406.02543
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
		
			Paper
			
•
			2406.06469
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into
  Multimodal LLMs at Scale
		
			Paper
			
•
			2406.19280
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
		
			Paper
			
•
			2407.01370
			
•
			Published
				
			•
				
				89
			
 
	
	 
	
	
	
			
			Planetarium: A Rigorous Benchmark for Translating Text to Structured
  Planning Languages
		
			Paper
			
•
			2407.03321
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Training Task Experts through Retrieval Based Distillation
		
			Paper
			
•
			2407.05463
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with
  Inverse-Instruct
		
			Paper
			
•
			2407.05700
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
		
			Paper
			
•
			2407.15711
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
		
			Paper
			
•
			2407.20183
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			OmniParser for Pure Vision Based GUI Agent
		
			Paper
			
•
			2408.00203
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Amuro & Char: Analyzing the Relationship between Pre-Training and
  Fine-Tuning of Large Language Models
		
			Paper
			
•
			2408.06663
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			To Code, or Not To Code? Exploring Impact of Code in Pre-training
		
			Paper
			
•
			2408.10914
			
•
			Published
				
			•
				
				44
			
 
	
	 
	
	
	
			
			Building and better understanding vision-language models: insights and
  future directions
		
			Paper
			
•
			2408.12637
			
•
			Published
				
			•
				
				133
			
 
	
	 
	
	
	
			
			MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution
  Real-World Scenarios that are Difficult for Humans?
		
			Paper
			
•
			2408.13257
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
		
			Paper
			
•
			2409.01704
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			ContextCite: Attributing Model Generation to Context
		
			Paper
			
•
			2409.00729
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Attention Heads of Large Language Models: A Survey
		
			Paper
			
•
			2409.03752
			
•
			Published
				
			•
				
				92
			
 
	
	 
	
	
	
			
			A Controlled Study on Long Context Extension and Generalization in LLMs
		
			Paper
			
•
			2409.12181
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
  reasoning
		
			Paper
			
•
			2409.12183
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Attention Prompting on Image for Large Vision-Language Models
		
			Paper
			
•
			2409.17143
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Can Models Learn Skill Composition from Examples?
		
			Paper
			
•
			2409.19808
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Law of the Weakest Link: Cross Capabilities of Large Language Models
		
			Paper
			
•
			2409.19951
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			LLaVA-Critic: Learning to Evaluate Multimodal Models
		
			Paper
			
•
			2410.02712
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			From Medprompt to o1: Exploration of Run-Time Strategies for Medical
  Challenge Problems and Beyond
		
			Paper
			
•
			2411.03590
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Autoregressive Models in Vision: A Survey
		
			Paper
			
•
			2411.05902
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Stronger Models are NOT Stronger Teachers for Instruction Tuning
		
			Paper
			
•
			2411.07133
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			ReFocus: Visual Editing as a Chain of Thought for Structured Image
  Understanding
		
			Paper
			
•
			2501.05452
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Demystifying Domain-adaptive Post-training for Financial LLMs
		
			Paper
			
•
			2501.04961
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
  Task Synthesis
		
			Paper
			
•
			2412.19723
			
•
			Published
				
			•
				
				87
			
 
	
	 
	
	
	
			
			Are VLMs Ready for Autonomous Driving? An Empirical Study from the
  Reliability, Data, and Metric Perspectives
		
			Paper
			
•
			2501.04003
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			MLLM-as-a-Judge for Image Safety without Human Labeling
		
			Paper
			
•
			2501.00192
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
  More Self-Confident Even When They Are Wrong
		
			Paper
			
•
			2501.09775
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
		
			Paper
			
•
			2501.09781
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			CodeMonkeys: Scaling Test-Time Compute for Software Engineering
		
			Paper
			
•
			2501.14723
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			s1: Simple test-time scaling
		
			Paper
			
•
			2501.19393
			
•
			Published
				
			•
				
				124
			
 
	
	 
	
	
	
			
			LIMO: Less is More for Reasoning
		
			Paper
			
•
			2502.03387
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
		
			Paper
			
•
			2502.01941
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Demystifying Long Chain-of-Thought Reasoning in LLMs
		
			Paper
			
•
			2502.03373
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
  Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
		
			Paper
			
•
			2502.01081
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Great Models Think Alike and this Undermines AI Oversight
		
			Paper
			
•
			2502.04313
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision
  Models
		
			Paper
			
•
			2502.06755
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
		
			Paper
			
•
			2502.07445
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			NoLiMa: Long-Context Evaluation Beyond Literal Matching
		
			Paper
			
•
			2502.05167
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			From RAG to Memory: Non-Parametric Continual Learning for Large Language
  Models
		
			Paper
			
•
			2502.14802
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Is That Your Final Answer? Test-Time Scaling Improves Selective Question
  Answering
		
			Paper
			
•
			2502.13962
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Small Models Struggle to Learn from Strong Reasoners
		
			Paper
			
•
			2502.12143
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
  Possess Test-Time Scaling Capabilities?
		
			Paper
			
•
			2502.12215
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on
  Continual Pre-Training
		
			Paper
			
•
			2502.11196
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Ask in Any Modality: A Comprehensive Survey on Multimodal
  Retrieval-Augmented Generation
		
			Paper
			
•
			2502.08826
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Intuitive physics understanding emerges from self-supervised pretraining
  on natural videos
		
			Paper
			
•
			2502.11831
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Show Me the Work: Fact-Checkers' Requirements for Explainable Automated
  Fact-Checking
		
			Paper
			
•
			2502.09083
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			MLGym: A New Framework and Benchmark for Advancing AI Research Agents
		
			Paper
			
•
			2502.14499
			
•
			Published
				
			•
				
				192
			
 
	
	 
	
	
	
			
			The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
  Agentic Tasks
		
			Paper
			
•
			2502.08235
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit
  Matching Visual Cues
		
			Paper
			
•
			2502.12084
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations
  in Large Language Models
		
			Paper
			
•
			2502.14302
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Can Large Language Models Detect Errors in Long Chain-of-Thought
  Reasoning?
		
			Paper
			
•
			2502.19361
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			VEM: Environment-Free Exploration for Training GUI Agent with Value
  Environment Model
		
			Paper
			
•
			2502.18906
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM
  Compression Preserve?
		
			Paper
			
•
			2502.17535
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			MLLMs Know Where to Look: Training-free Perception of Small Visual
  Details with Multimodal LLMs
		
			Paper
			
•
			2502.17422
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Chain of Draft: Thinking Faster by Writing Less
		
			Paper
			
•
			2502.18600
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
		
			Paper
			
•
			2503.02268
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic
  Templatisation and Orthographic Obfuscation
		
			Paper
			
•
			2503.02972
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			How to Steer LLM Latents for Hallucination Detection?
		
			Paper
			
•
			2503.01917
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			On the Acquisition of Shared Grammatical Representations in Bilingual
  Language Models
		
			Paper
			
•
			2503.03962
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Where do Large Vision-Language Models Look at when Answering Questions?
		
			Paper
			
•
			2503.13891
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Can Large Vision Language Models Read Maps Like a Human?
		
			Paper
			
•
			2503.14607
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
  Learning
		
			Paper
			
•
			2503.21620
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			I Have Covered All the Bases Here: Interpreting Reasoning Features in
  Large Language Models via Sparse Autoencoders
		
			Paper
			
•
			2503.18878
			
•
			Published
				
			•
				
				119
			
 
	
	 
	
	
	
			
			Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
  Ahead
		
			Paper
			
•
			2504.00294
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			What Makes a Good Natural Language Prompt?
		
			Paper
			
•
			2506.06950
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Hidden in plain sight: VLMs overlook their visual representations
		
			Paper
			
•
			2506.08008
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
  Programming?
		
			Paper
			
•
			2506.11928
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
  Correct Reasoning in Base LLMs
		
			Paper
			
•
			2506.14245
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			Does Math Reasoning Improve General LLM Capabilities? Understanding
  Transferability of LLM Reasoning
		
			Paper
			
•
			2507.00432
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
  Agents
		
			Paper
			
•
			2507.19478
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Attention Basin: Why Contextual Position Matters in Large Language
  Models
		
			Paper
			
•
			2508.05128
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Don't Overthink It: A Survey of Efficient R1-style Large Reasoning
  Models
		
			Paper
			
•
			2508.02120
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
		
			Paper
			
•
			2508.11737
			
•
			Published
				
			•
				
				109
			
 
	
	 
	
	
	
			
			UItron: Foundational GUI Agent with Advanced Perception and Planning
		
			Paper
			
•
			2508.21767
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			What Characterizes Effective Reasoning? Revisiting Length, Review, and
  Structure of CoT
		
			Paper
			
•
			2509.19284
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Video models are zero-shot learners and reasoners
		
			Paper
			
•
			2509.20328
			
•
			Published
				
			•
				
				96
			
 
	
	 
	
	
	
			
			OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
		
			Paper
			
•
			2510.24563
			
•
			Published
				
			•
				
				22