view post Post 2456 MLEB is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. https://huggingface.co/blog/isaacus/introducing-mleb See translation π 5 5 π₯ 4 4 β€οΈ 4 4 β 3 3 π€ 3 3 π 3 3 π§ 3 3 π€― 3 3 + Reply
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper β’ 2501.02045 β’ Published Jan 3 β’ 23
view post Post 452 Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence π 𧬠monsoon-nlp/tomatotomato-gLM2-150M-v0.1 See translation π 1 1 + Reply
view post Post 6794 We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago! See translation 6 replies Β· π 18 18 π 9 9 π₯ 6 6 + Reply