Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 24 days ago • 9
🕵️🛡️ Evaluation Meta Knowledge Collection 2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low • 11 items • Updated 11 days ago • 2
FutureSim: Replaying World Events to Evaluate Adaptive Agents Paper • 2605.15188 • Published May 14 • 7
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published May 12 • 17
NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist Paper • 2602.16756 • Published Feb 18 • 4
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10, 2025 • 20
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models Paper • 2510.14961 • Published Oct 16, 2025 • 8
Training Dynamics Impact Post-Training Quantization Robustness Paper • 2510.06213 • Published Oct 7, 2025 • 3
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
FAST: Factorizable Attention for Speeding up Transformers Paper • 2402.07901 • Published Feb 12, 2024 • 3
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 21
answer-matching Collection Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation" • 2 items • Updated Jul 3, 2025 • 2
Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published Jul 3, 2025 • 9
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching Paper • 2506.20480 • Published Jun 25, 2025 • 7
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5, 2025 • 34
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published Feb 14, 2025 • 9