Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization Paper • 2008.11293 • Published Aug 25, 2020
Bayesian Deep Learning for Exoplanet Atmospheric Retrieval Paper • 1811.03390 • Published Nov 8, 2018
Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models Paper • 2512.10362 • Published 15 days ago • 1
view post Post 293 Hey, I went to Hangzhou to talk about retrain-pipelines at the GOSIM Foundation's conference last september.The recording just got released. Go check it out !https://www.youtube.com/watch?v=nmrMachM5aMSlides are there :https://docs.google.com/presentation/d/1hnAzHJ0SbeAOtGJir-iH84RBtXT1OxVT/ See translation 2 replies · 🔥 3 3 + Reply
view post Post 242 Is it time to start developing sparse attention again?https://github.com/SmallDoges/flash-sparse-attention See translation Reply
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies Paper • 2510.17950 • Published Oct 20 • 7
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning Paper • 2401.14011 • Published Jan 25, 2024 • 1
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation Paper • 2406.07070 • Published Jun 11, 2024
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 2
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition Paper • 2502.18913 • Published Feb 26
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors Paper • 2503.16578 • Published Mar 20
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs Paper • 2505.11842 • Published May 17 • 2
EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations Paper • 2505.23018 • Published May 29
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information Paper • 2508.11252 • Published Aug 15 • 3
RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis Paper • 2508.10015 • Published Aug 6
Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning Paper • 2508.02178 • Published Aug 4