Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.01574

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Paper • 2210.09261 • Published Oct 17, 2022 • 1
BIG-Bench Extra Hard

Paper • 2502.19187 • Published Feb 26 • 10

DS' Daily paper

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 95
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Paper • 2405.20541 • Published May 30, 2024 • 24
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51

Video Understanding

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10, 2024 • 29
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 75
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26

safety & alignment

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Paper • 2404.01318 • Published Mar 28, 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Paper • 2508.07750 • Published Aug 11 • 19

Benchmarks and Evals

Awesome Collection of Benchmarks and Evaluation Papers

Measuring Massive Multitask Language Understanding

Paper • 2009.03300 • Published Sep 7, 2020 • 3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Paper • 2311.12022 • Published Nov 20, 2023 • 33
HellaSwag: Can a Machine Really Finish Your Sentence?

Paper • 1905.07830 • Published May 19, 2019 • 6

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
LiveBench: A Challenging, Contamination-Free LLM Benchmark

Paper • 2406.19314 • Published Jun 27, 2024 • 23

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 34
CsFEVER and CTKFacts: Acquiring Czech data for fact verification

Paper • 2201.11115 • Published Jan 26, 2022
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 31

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

safety & alignment

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Paper • 2404.01318 • Published Mar 28, 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Paper • 2508.07750 • Published Aug 11 • 19

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Paper • 2210.09261 • Published Oct 17, 2022 • 1
BIG-Bench Extra Hard

Paper • 2502.19187 • Published Feb 26 • 10

Benchmarks and Evals

Awesome Collection of Benchmarks and Evaluation Papers

Measuring Massive Multitask Language Understanding

Paper • 2009.03300 • Published Sep 7, 2020 • 3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Paper • 2311.12022 • Published Nov 20, 2023 • 33
HellaSwag: Can a Machine Really Finish Your Sentence?

Paper • 1905.07830 • Published May 19, 2019 • 6

DS' Daily paper

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 95
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Paper • 2405.20541 • Published May 30, 2024 • 24
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
LiveBench: A Challenging, Contamination-Free LLM Benchmark

Paper • 2406.19314 • Published Jun 27, 2024 • 23

Video Understanding

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10, 2024 • 29
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 75
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 51
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 34
CsFEVER and CTKFacts: Acquiring Czech data for fact verification

Paper • 2201.11115 • Published Jan 26, 2022
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs