Papers
arxiv:2510.18428

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

Published on Oct 21
· Submitted by Ao Qu on Oct 23
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

AlphaOPT is a self-improving library that enables an LLM to learn from limited demonstrations and solver feedback, improving optimization modeling across industries without costly retraining.

AI-generated summary

Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.

Community

Paper submitter

Optimization modeling is critical but often requires deep expert knowledge. AlphaOPT introduces a self-improving experience library that enables large language models to learn from solver feedback and limited demonstrations—without retraining or annotated reasoning traces.
By iteratively reflecting on failed attempts and refining structured insights ({taxonomy, condition, explanation, example}), AlphaOPT continually improves its formulation ability across diverse OR tasks.
A promising step toward autonomous, interpretable, and continually learning LLM systems for operations research.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.18428 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.18428 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.18428 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.