@abdurrahmanbutler on Hugging Face: "🎉 I am excited to share news of a project my brother, Umar Butler, and I have…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

abdurrahmanbutler

posted an update 10 days ago

Post

2516

🎉 I am excited to share news of a project my brother, Umar Butler, and I have been working on for what feels like an eternity now.

𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐌𝐋𝐄𝐁 — 𝐭𝐡𝐞 𝐌𝐚𝐬𝐬𝐢𝐯𝐞 𝐋𝐞𝐠𝐚𝐥 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐁𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤.

A suite of 10 high-quality English legal IR datasets, designed by legal experts to set a new standard for comparing embedding models.

Whether you’re exploring legal RAG on your home computer, or running enterprise-scale retrieval, apples-to-apples evaluation is crucial. That’s why we’ve open-sourced everything - including our 7 brand-new, hand-crafted retrieval datasets. All of these datasets are now live on Hugging Face.

Any guesses which embedding model leads on legal retrieval?

𝐇𝐢𝐧𝐭: it’s not OpenAI or Google - they place 7th and 9th on our leaderboard.

To do well on MLEB, embedding models must demonstrate both extensive legal domain knowledge and strong legal reasoning skills.

https://huggingface.co/blog/isaacus/introducing-mleb

umarbutler

10 days ago

@abdurrahmanbutler I'm proud to have worked with you on this! MLEB is truly monumental. Given how often embeddings are used for law (and, consequently, how much money is on the line), it is crazy that we didn't have a truly fit-for-purpose benchmark for legal embeddings until today.

In this post