LMD Deduplication Supplements
This repository provides the pre-trained CAugBERT model checkpoint used in:
"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)
[Paper] | [GitHub Code]
Usage
You can either integrate this checkpoint into the main repository for inference, or load it directly:
# Option 1: Run inference in the main repo
poetry run python inference.py # make sure yamls/inference.yaml paths are correct
# Option 2: Load checkpoint manually
import torch
from contrastive_musicbert.model.BERT import BERT_Lightning
model = BERT_Lightning(...).to(device) # see .hydra/config.yaml for arguments
checkpoint = torch.load(checkpoint_path, map_location="cpu")
model.load_state_dict(checkpoint['state_dict'])
Note
If you have any questions regarding the checkpoint, please contact: Eunjin Choi ([email protected])
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support