Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2303.08896

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Paper • 2411.14257 • Published Nov 21, 2024 • 14
Distinguishing Ignorance from Error in LLM Hallucinations

Paper • 2410.22071 • Published Oct 29, 2024
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Paper • 2410.18860 • Published Oct 24, 2024 • 11
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Paper • 2410.11779 • Published Oct 15, 2024 • 26

Evals & Monitoring

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Paper • 2303.16634 • Published Mar 29, 2023 • 3
miracl/miracl-corpus

Viewer • Updated Jan 5, 2023 • 77.2M • 3.53k • 47
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 37
How is ChatGPT's behavior changing over time?

Paper • 2307.09009 • Published Jul 18, 2023 • 24

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Paper • 2509.16198 • Published Sep 19 • 126
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21 • 20
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Paper • 2303.08896 • Published Mar 15, 2023 • 4
From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Paper • 2404.16130 • Published Apr 24, 2024 • 6

LLM Hallucination Detection Papers

Collection of LLM hallucination and evaluation papers that I've been exploring and implementing. Some of them have my comments and annotated doodles.

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

Paper • 2208.05309 • Published Aug 10, 2022 • 1
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models

Paper • 2305.13711 • Published May 23, 2023 • 2
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Paper • 2302.09664 • Published Feb 19, 2023 • 4
BARTScore: Evaluating Generated Text as Text Generation

Paper • 2106.11520 • Published Jun 22, 2021 • 2

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Paper • 2509.16198 • Published Sep 19 • 126
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21 • 20
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Paper • 2303.08896 • Published Mar 15, 2023 • 4
From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Paper • 2404.16130 • Published Apr 24, 2024 • 6

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Paper • 2411.14257 • Published Nov 21, 2024 • 14
Distinguishing Ignorance from Error in LLM Hallucinations

Paper • 2410.22071 • Published Oct 29, 2024
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Paper • 2410.18860 • Published Oct 24, 2024 • 11
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Paper • 2410.11779 • Published Oct 15, 2024 • 26

LLM Hallucination Detection Papers

Collection of LLM hallucination and evaluation papers that I've been exploring and implementing. Some of them have my comments and annotated doodles.

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

Paper • 2208.05309 • Published Aug 10, 2022 • 1
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models

Paper • 2305.13711 • Published May 23, 2023 • 2
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Paper • 2302.09664 • Published Feb 19, 2023 • 4
BARTScore: Evaluating Generated Text as Text Generation

Paper • 2106.11520 • Published Jun 22, 2021 • 2

Evals & Monitoring

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Paper • 2303.16634 • Published Mar 29, 2023 • 3
miracl/miracl-corpus

Viewer • Updated Jan 5, 2023 • 77.2M • 3.53k • 47
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 37
How is ChatGPT's behavior changing over time?

Paper • 2307.09009 • Published Jul 18, 2023 • 24

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs