arxiv:2510.12487

Diff-XYZ: A Benchmark for Evaluating Diff Understanding

Published on Oct 14

· Submitted by

Evgeniy Glukhov on Oct 24

JetBrains Research

Upvote

Authors:

Evgeniy Glukhov ,

Egor Bogomolov ,

Abstract

A benchmark for code-diff understanding with tasks including apply, anti-apply, and diff generation, revealing optimal diff formats based on model size and use case.

AI-generated summary

Reliable handling of code diffs is central to agents that edit and refactor repositories at scale. We introduce Diff-XYZ, a compact benchmark for code-diff understanding with three supervised tasks: apply (old code + diff rightarrow new code), anti-apply (new code - diff rightarrow old code), and diff generation (new code - old code rightarrow diff). Instances in the benchmark are triples langle old code, new code, diff rangle drawn from real commits in CommitPackFT, paired with automatic metrics and a clear evaluation protocol. We use the benchmark to do a focused empirical study of the unified diff format and run a cross-format comparison of different diff representations. Our findings reveal that different formats should be used depending on the use case and model size. For example, representing diffs in search-replace format is good for larger models in the diff generation scenario, yet not suited well for diff analysis and smaller models. The Diff-XYZ benchmark is a reusable foundation for assessing and improving diff handling in LLMs that can aid future development of diff formats and models editing code. The dataset is published on HuggingFace Hub: https://huggingface.co/datasets/JetBrains-Research/diff-xyz.

View arXiv page View PDF Add to collection

Community

jenyag

Paper author Paper submitter 5 days ago

We present a paper that explores LLMs capabilities to understand diffs and to generate diffs.

librarian-bot

4 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.12487 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.12487 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.