evaluate

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

lvwerra authored a paper 17 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

sasha authored a paper 23 days ago

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model

sasha authored a paper 23 days ago

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

View all activity

meg

posted an update 1 day ago

Post

1524

🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate

lvwerra

authored a paper 17 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published 21 days ago • 32

sasha

authored 3 papers 23 days ago

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model

Paper • 2211.02001 • Published Nov 3, 2022

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

Paper • 2409.14160 • Published Sep 21, 2024 • 3

From Efficiency Gains to Rebound Effects: The Problem of Jevons' Paradox in AI's Polarized Environmental Debate

Paper • 2501.16548 • Published Jan 27

meg

posted an update about 1 month ago

Post

2862

🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With the Gradio team, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how: https://huggingface.co/blog/watermarking-with-gradio
Thanks to the code collab in particular from @abidlabs and Yuvraj Sharma.

meg

posted an update 3 months ago

Post

2935

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .

2 replies

lhoestq

in evaluate/glue-ci 3 months ago

Convert dataset to Parquet

#4 opened 3 months ago by

lhoestq

Update README.md

#3 opened 3 months ago by

lhoestq

meg

posted an update 3 months ago

Post

437

🤖 ICYMI: Yesterday, Hugging Face and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world".

0. Common ground setting: OpenAI is the ChatGPT people. An “open source” model is one whose weights are available — that means the model can be “yours”.
1. You don’t have to interact with the company directly, nor give them your interactions, to use the system. The company can't "surveil" you.
2. You can evaluate the unique contributions of their SOTA model much more rigorously than you can when there are collections of models+code behind a closed API. You can find out specifically what the model can and can't do.
3. And you can directly customize it for whatever you'd like. Fine-tuning, wherein you give the model data that's tailored to your use cases and train it some more on that data, is trivial* when you have the model weights.
*Provided you have the compute.
4. You can directly benchmark whatever you'd like. Biases? Energy usage? Strengths/weaknesses? Go for it. You wants it you gots it--this transparency helps people understand SOTA *in general*, not just for this model, but points to, e.g., what's going on with closed Google models as well.
5. One of the most powerful things about "openness" that I've learned is that it cultivates ecosystems of collaborators building on top of one another's brilliance to make systems that are significantly better than they would be if created in isolation.
But, caveat wrt my own philosophy...
6. I do not take it as a given that advancing LLMs is good, and have a lot more to say wrt where I think innovation should focus more. For example, a focus on *data* -- curation, measurement, consent, credit, compensation, safety -- would deeply improve technology for everyone.
7. The transparency this release provides is massive for people who want to *learn* about LLMs. For the next generation of technologists to advance over the current, they MUST be able to learn about what's happening now. (cont...)

1 reply

meg

posted an update 3 months ago

Post

493

🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.

lvwerra

authored a paper 4 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 73

meg

posted an update 6 months ago

Post

2732

New launch: See the energy use of chatbot conversations, in real time. =)
jdelavande/chat-ui-energy
Great work from @JulienDelavande !

lvwerra

authored a paper 7 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

sasha

authored a paper 7 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 35

meg

authored a paper 7 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 35

lvwerra

authored a paper 9 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 243

lvwerra

authored a paper 10 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63

meg

posted an update 10 months ago

Post

3407

💫...And we're live!💫 Seasonal newsletter from ethicsy folks at Hugging Face, exploring the ethics of "AI Agents"
https://huggingface.co/blog/ethics-soc-7
Our analyses found:
- There's a spectrum of "agent"-ness
- *Safety* is a key issue, leading to many other value-based concerns
Read for details & what to do next!
With @evijit , @giadap , and @sasha

lhoestq

authored a paper 11 months ago

Croissant: A Metadata Format for ML-Ready Datasets

Paper • 2403.19546 • Published Mar 28, 2024 • 1

AI & ML interests

Recent Activity

Team members 5

evaluate's activity

Convert dataset to Parquet

Update README.md