view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning about 19 hours ago โข 10
view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 2 days ago โข 10
view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? 14 days ago โข 17
FINAL Bench Collection World's First Functional Metacognition Benchmark. "Not how much AI knows โ but whether it knows what it doesn't know, and can fix it." โข 2 items โข Updated 17 days ago โข 4