lmd-dedup-caugbert / README.md
jech2's picture
update readme
d0af2eb
metadata
pipeline_tag: other
language: en
library_name: pytorch
license: apache-2.0
tags:
  - music
  - midi
  - mir
  - deduplication
  - caugbert
model-index:
  - name: LMD Deduplication - CAugBERT
    results:
      - task:
          type: representation-learning
          name: symbolic music representation learning
        dataset:
          type: midi
          name: Lakh MIDI Dataset
        metrics:
          - type: F1
            value: 0.493

LMD Deduplication Supplements

This repository provides the pre-trained CAugBERT model checkpoint used in: "On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)
[Paper] | [GitHub Code]


Usage

You can either integrate this checkpoint into the main repository for inference, or load it directly:

# Option 1: Run inference in the main repo
poetry run python inference.py  # make sure yamls/inference.yaml paths are correct
# Option 2: Load checkpoint manually
import torch
from contrastive_musicbert.model.BERT import BERT_Lightning

model = BERT_Lightning(...).to(device)  # see .hydra/config.yaml for arguments
checkpoint = torch.load(checkpoint_path, map_location="cpu")
model.load_state_dict(checkpoint['state_dict'])

Note

If you have any questions regarding the checkpoint, please contact: Eunjin Choi ([email protected])