Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
etemiz 
posted an update 14 days ago
Post
1720
Today's winner is Ling 1T with a score of 38!

Btw AHA2 is in the works, with more domains, better comparison LLMs and questions, overall better signal.

Nice update — Ling 1T’s consistency here is impressive.

This kind of work is what pushed me to think more about when systems should refuse to report results instead of reporting early with caveats.

I’ve been building a small verifier that enforces “no result until durable” as a hard rule. It’s interesting how much trust behavior changes when refusal is allowed.