MIDI-LLM
IMPORTANT We're still working on the companion Github codebase, please check back in a few days, thanks!
Built on Llama 3.2 (1B) with an extended vocabulary for MIDI tokens.
Research Paper
- Shih-Lun Wu, Yoon Kim, and Cheng-Zhi Anna Huang.
"MIDI-LLM: Adapting large language models for text-to-MIDI music generation."
NeurIPS AI4Music Workshop, 2025.
Model Description
- Base Model:
meta-llama/Llama-3.2-1B - Model Size: 1.4B parameters
- Extended Vocabulary: 183,286 tokens (128,256 for text + 55,030 for MIDI music)
- Architecture:
LlamaForCausalLMwith extended embedding layer - Precision: BFloat16
Quick Start
Clone our Github code repo (coming soon), run through setup steps, and try:
git clone https://github.com/slSeanWU/MIDI-LLM
cd MIDI-LLM
python generate_transformers.py \
--model slseanwu/MIDI-LLM_Llama-3.2-1B \
--prompt "A cheerful rock song with bright electric guitars" \
--n_outputs 4
The repo and inference scripts provide a more complete usage guide.
Model Details
Extended Vocabulary
The model extends Llama 3.2's vocabulary (128,256 tokens) with 55,030 MIDI tokens representing:
- Onset time (when notes occur)
- Durations (how long each note is held)
- Instrument-pitch pair (which note to play & by which instrument)
These tokens follow the vocabulary of Anticipatory Music Transformer (AMT) (Thickstun et al., TMLR 2024).
Training Data
- Datasets:
- Training objective: Causal language modeling
- Training sequence length: 2,048
- System prompt:
You are a world-class composer. Please compose some music according to the following description: [your input text]
Inference Hyperparameters
Recommended settings for best results:
temperature: 1.0
top_p: 0.98
max_tokens: 2046
Citation
If you find our model useful, please cite our research as
@inproceedings{wu2025midillm,
title={MIDI-LLM: Adapting large language models for text-to-MIDI music generation},
author={Wu, Shih-Lun and Kim, Yoon and Huang, Cheng-Zhi Anna},
booktitle={Proc. NeurIPS AI4Music Workshop},
year={2025}
}
License
This model is based on Llama 3.2 and is subject to the Llama 3.2 Community License.
- Downloads last month
- 78
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support