LeJEPA v1 POC — Tiny-ImageNet
100-epoch reproduction of the LeJEPA self-supervised representation learning recipe from Balestriero & LeCun, "LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics" (arXiv:2511.08544), on the Tiny-ImageNet dataset (200 classes, 100K images).
Status: POC. 100-epoch reproduction (paper uses 800). Linear-probe val_acc 23.77% (47× chance on 200 classes). Random-encoder baseline computed 2026-05-16: random ViT-S + frozen + linear probe + matched 100 epochs reaches val_acc 8.50% (17× chance). JEPA training contributes 2.80× over random encoder, adding 15.27pp absolute.
Final metrics (epoch 99)
| Metric | Value |
|---|---|
| SIGReg loss | 1.584 (down from 11.826 at epoch 0) |
| Invariance loss | 0.126 (down from 0.447) |
| Probe CE loss | 3.610 (down from 5.195) |
| Linear-probe val_acc | 0.2377 |
No representation collapse — SIGReg loss converges, val_acc rises monotonically.
Baseline comparison
| Setup | val_acc | × chance | Δ over chance |
|---|---|---|---|
| Chance (uniform) | 0.005 | 1.0× | — |
| Random ViT-S + frozen + LP | 0.0850 | 17.0× | +0.080 |
| LeJEPA-trained ViT-S + frozen + LP | 0.2377 | 47.5× | +0.233 |
| Δ (JEPA over random) | +0.1527 | 2.80× | — |
Honest framing: JEPA training adds 15.27pp absolute over a matched-architecture random encoder. About 36% of the val_acc-over-chance gain comes from "random high-dim projection + linear probe is non-trivial on Tiny-ImageNet" (a known random-features effect, Rahimi & Recht 2007), 64% comes from JEPA training.
Architecture (matches paper MINIMAL.md)
- Encoder: timm
vit_small_patch8_224,img_size=128,num_classes=512 - Projection: MLP 512 → 2048 → 2048 → 16 (proj_dim)
- 4 views per image (V=4)
- Loss:
λ·SIGReg(proj) + (1-λ)·invariance_loss, λ=0.02 - Linear probe joint with
.detach()for monitoring
Hyperparameters
- AdamW lr=2e-3, wd=5e-2
- Batch 256, bf16 mixed precision
- 100 epochs, cosine LR with linear warmup
Files
| File | Description |
|---|---|
latest.pt |
Final encoder + projection + probe state dict (309 MB) |
training_history.json |
Per-epoch metrics for all 100 epochs |
training_curves.png |
Loss + val_acc curves |
Reproducibility
- Training notebook:
nb_lejepa_v1_tinyimagenet.ipynb - Random-encoder baseline:
nb_lejepa_v1_random_encoder_baseline.ipynb
Limitations
- POC at 100 epochs (paper uses 800). Linear-probe accuracy proportional.
- Not SOTA. DINOv2/MAE on the same scale would beat this.
- v2 cross-modal extension (Flickr30K visão + texto) pending.
Citation
@misc{openinterp-lejepa-v1-tinyimagenet-2026,
author = {Vicentino, Caio},
title = {LeJEPA v1 POC: Tiny-ImageNet 100-epoch reproduction with random-encoder baseline},
year = {2026},
url = {https://huggingface.co/caiovicentino1/lejepa-v1-tinyimagenet}
}