A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 259
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29, 2025 • 53
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6, 2025 • 21