LeJEPA v1 POC — Tiny-ImageNet

100-epoch reproduction of the LeJEPA self-supervised representation learning recipe from Balestriero & LeCun, "LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics" (arXiv:2511.08544), on the Tiny-ImageNet dataset (200 classes, 100K images).

Status: POC. 100-epoch reproduction (paper uses 800). Linear-probe val_acc 23.77% (47× chance on 200 classes). Random-encoder baseline computed 2026-05-16: random ViT-S + frozen + linear probe + matched 100 epochs reaches val_acc 8.50% (17× chance). JEPA training contributes 2.80× over random encoder, adding 15.27pp absolute.

Final metrics (epoch 99)

Metric Value
SIGReg loss 1.584 (down from 11.826 at epoch 0)
Invariance loss 0.126 (down from 0.447)
Probe CE loss 3.610 (down from 5.195)
Linear-probe val_acc 0.2377

No representation collapse — SIGReg loss converges, val_acc rises monotonically.

Baseline comparison

Setup val_acc × chance Δ over chance
Chance (uniform) 0.005 1.0×
Random ViT-S + frozen + LP 0.0850 17.0× +0.080
LeJEPA-trained ViT-S + frozen + LP 0.2377 47.5× +0.233
Δ (JEPA over random) +0.1527 2.80×

Honest framing: JEPA training adds 15.27pp absolute over a matched-architecture random encoder. About 36% of the val_acc-over-chance gain comes from "random high-dim projection + linear probe is non-trivial on Tiny-ImageNet" (a known random-features effect, Rahimi & Recht 2007), 64% comes from JEPA training.

Architecture (matches paper MINIMAL.md)

  • Encoder: timm vit_small_patch8_224, img_size=128, num_classes=512
  • Projection: MLP 512 → 2048 → 2048 → 16 (proj_dim)
  • 4 views per image (V=4)
  • Loss: λ·SIGReg(proj) + (1-λ)·invariance_loss, λ=0.02
  • Linear probe joint with .detach() for monitoring

Hyperparameters

  • AdamW lr=2e-3, wd=5e-2
  • Batch 256, bf16 mixed precision
  • 100 epochs, cosine LR with linear warmup

Files

File Description
latest.pt Final encoder + projection + probe state dict (309 MB)
training_history.json Per-epoch metrics for all 100 epochs
training_curves.png Loss + val_acc curves

Reproducibility

Limitations

  • POC at 100 epochs (paper uses 800). Linear-probe accuracy proportional.
  • Not SOTA. DINOv2/MAE on the same scale would beat this.
  • v2 cross-modal extension (Flickr30K visão + texto) pending.

Citation

@misc{openinterp-lejepa-v1-tinyimagenet-2026,
  author = {Vicentino, Caio},
  title  = {LeJEPA v1 POC: Tiny-ImageNet 100-epoch reproduction with random-encoder baseline},
  year   = {2026},
  url    = {https://huggingface.co/caiovicentino1/lejepa-v1-tinyimagenet}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for caiovicentino1/lejepa-v1-tinyimagenet