apomath

AI & ML interests

None defined yet.

Recent Activity

xw27 authored a paper 4 days ago

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

xw27 authored a paper 4 days ago

Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction

xw27 authored a paper 4 days ago

Entropy-Based Adaptive Weighting for Self-Training

View all activity

xw27

authored 7 papers 4 days ago

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Paper • 2406.09923 • Published Jun 14, 2024 • 1

Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction

Paper • 2501.17326 • Published Jan 28, 2025

Entropy-Based Adaptive Weighting for Self-Training

Paper • 2503.23913 • Published Mar 31, 2025 • 3

MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science

Paper • 2510.12171 • Published Oct 14, 2025

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

Paper • 2511.15137 • Published Nov 19, 2025

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

Paper • 2602.21534 • Published Feb 25 • 26

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Paper • 2606.12882 • Published 18 days ago • 13

xw27

submitted a paper to Daily Papers 17 days ago

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Paper • 2606.12882 • Published 18 days ago • 13

authored a paper 3 months ago

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Paper • 2604.08539 • Published Apr 9 • 51

xw27

submitted a paper to Daily Papers 4 months ago

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

Paper • 2602.21534 • Published Feb 25 • 26

xw27

authored a paper about 1 year ago

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Paper • 2307.10635 • Published Jul 20, 2023 • 9

authored 3 papers over 1 year ago

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21, 2025 • 24

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Paper • 2502.05163 • Published Feb 7, 2025 • 22

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published Oct 29, 2024 • 18

authored 2 papers almost 2 years ago

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

Paper • 2405.19716 • Published May 30, 2024

MIRAI: Evaluating LLM Agents for Event Forecasting

Paper • 2407.01231 • Published Jul 1, 2024 • 18

posted an update almost 2 years ago

Post

1739

Check out our new benchmark paper on LLM agents for global events forecasting! MIRAI: Evaluating LLM Agents for Event Forecasting (2407.01231)

📜 Arxiv: https://arxiv.org/abs/2407.01231
🔗 Project page: https://mirai-llm.github.io
💻 GitHub Repo: https://github.com/yecchen/MIRAI
📁 Dataset: https://drive.google.com/file/d/1xmSEHZ_wqtBu1AwLpJ8wCDYmT-jRpfrN/view?usp=sharing
📊 Interactive Demo Notebook: https://colab.research.google.com/drive/1QyqT35n6NbtPaNtqQ6A7ILG_GMeRgdnO?usp=sharing

authored 3 papers over 2 years ago

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

Paper • 2402.08680 • Published Feb 13, 2024 • 1

Robust Learning with Progressive Data Expansion Against Spurious Correlation

Paper • 2306.04949 • Published Jun 8, 2023

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

Paper • 2311.04205 • Published Nov 7, 2023 • 5