Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
title: PDE Leaderboard (The Well)
emoji: 📈
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.47.1
app_file: app.py
pinned: false
The Well – PDE Baselines: Reproducible Evaluation Guide
A clean, GPU-friendly recipe to evaluate PDE surrogate models from The Well on the acoustic_scattering_maze task and generate a submit.json ready for a team leaderboard or Hugging Face Space.
TL;DR
# 0) create venv
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
# 1) install PyTorch (adjust cu121 to your CUDA)
pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
# 2) get The Well (full repo with benchmark extras)
git clone https://github.com/PolymathicAI/the_well.git
cd the_well
pip install -e ".[benchmark]"
cd ..
# 3) place the evaluation script in this folder (eval_wdm_full.py)
python eval_wdm_full.py
# 4) see results
cat submit.json
What you get
A fully reproducible eval script (
eval_wdm_full.py) that:- streams the dataset from HF (or reads it locally if you prefer),
- assembles the 14-channel input feature tensor expected by the FNO baseline,
- applies Z-Score normalization (same convention as the benchmark),
- runs inference and produces a VRMSE@1 summary in
submit.json.
An optional variant (
eval_wdm_full_psamples.py) that also writes per-sample errors toper_sample_vrmse.jsonl.
Requirements
- Linux (recommended) with CUDA GPU.
- Python 3.10–3.12 recommended (3.13 can work, but some libs lag behind).
- CUDA/cuDNN compatible with your installed PyTorch wheels (e.g.,
cu121). - Git; optionally Git LFS if you plan to clone large checkpoints.
Environment Setup
# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# upgrade pip
pip install --upgrade pip
# install torch + CUDA wheels (change cu121 if needed)
pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
Install The Well (from source, with benchmark extras)
The PyPI package is trimmed; use the GitHub repo to get benchmark utilities.
git clone https://github.com/PolymathicAI/the_well.git
cd the_well
pip install -e ".[benchmark]"
cd ..
Quick sanity check:
python - <<'PY'
from the_well.benchmark.train import WellDataModule
from the_well.benchmark.models import FNO
print("OK: imports work")
PY
Scripts
Add these two files to your project folder (next to the the_well repo directory):
1) eval_wdm_full.py (clean, full test split)
# eval_wdm_full.py
import json, warnings, random
import numpy as np
import torch
from torch.utils.data import DataLoader
from the_well.benchmark.models import FNO
from the_well.benchmark.train import WellDataModule
from the_well.data.normalization import ZScoreNormalization
# Reproducibility
random.seed(0); np.random.seed(0); torch.manual_seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
# Quiet some noisy warnings
warnings.filterwarnings("ignore", category=UserWarning, module="tltorch")
warnings.filterwarnings("ignore", category=UserWarning, module="neuralop")
device = "cuda" if torch.cuda.is_available() else "cpu"
def to_bchw_take_t0_any(t: torch.Tensor) -> torch.Tensor:
if t.ndim == 5: # [B,T,H,W,C]
t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous()
elif t.ndim == 4: # [B,H,W,C]
t = t.permute(0, 3, 1, 2).contiguous()
elif t.ndim == 3: # [B,H,W]
t = t.unsqueeze(1).contiguous()
elif t.ndim == 2: # [B,C] -> [B,C,1,1]
t = t[:, :, None, None].contiguous()
else:
raise ValueError(f"Unsupported rank: {t.ndim}, shape={tuple(t.shape)}")
return t.float()
def expand_scalar_plane(x: torch.Tensor, H: int, W: int):
if x is None: return None
if x.ndim == 2: x = x[:, 0]
elif x.ndim != 1: raise ValueError("Expected (B,) or (B,T) for time scalars")
x = x[:, None, None, None]
return x.expand(-1, 1, H, W).float()
def assemble_x_14ch(b: dict) -> torch.Tensor:
parts = []
x_if = to_bchw_take_t0_any(b["input_fields"]) # ~3 channels
B, _, H, W = x_if.shape
parts.append(x_if)
if "space_grid" in b and torch.is_tensor(b["space_grid"]):
parts.append(to_bchw_take_t0_any(b["space_grid"])) # +2
if "constant_fields" in b and torch.is_tensor(b["constant_fields"]):
parts.append(to_bchw_take_t0_any(b["constant_fields"])) # +2
if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]):
bc = b["boundary_conditions"].view(B, -1)[:, :4]
bc = bc[:, :, None, None].expand(B, 4, H, W).float() # +4
parts.append(bc)
it = b.get("input_time_grid", None)
ot = b.get("output_time_grid", None)
if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W)) # +1
if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W)) # +1
x = torch.cat([p for p in parts if p is not None], dim=1)
IN_CH = 14
c = x.shape[1]
if c < IN_CH:
pad = torch.zeros(B, IN_CH - c, H, W, dtype=x.dtype)
x = torch.cat([x, pad], dim=1)
elif c > IN_CH:
x = x[:, :IN_CH]
return x
def zscore_per_channel(x: torch.Tensor, eps=1e-6) -> torch.Tensor:
mu = x.mean(dim=(2,3), keepdim=True)
sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps)
return (x - mu) / sd
def try_apply_dm_normalizer(dm, x, y):
for name in ("normalization", "normalizer", "norm"):
norm = getattr(dm, name, None)
if norm is None:
continue
if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"):
return norm.normalize_input(x), norm.normalize_output(y)
try:
return norm(x, which="input"), norm(y, which="output")
except Exception:
pass
try:
return norm.forward(x, is_input=True), norm.forward(y, is_input=False)
except Exception:
pass
return None
def main(out_path="submit.json"):
dm = WellDataModule(
well_base_path="hf://datasets/polymathic-ai/",
well_dataset_name="acoustic_scattering_maze",
batch_size=1,
data_workers=0,
use_normalization=True,
normalization_type=ZScoreNormalization,
n_steps_input=1, n_steps_output=1,
boundary_return_type="padding",
)
loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") \
else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False)
model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval()
scores = []
with torch.no_grad():
for b in loader:
x = assemble_x_14ch(b).to(device)
y = to_bchw_take_t0_any(b["output_fields"]).to(device)
applied = try_apply_dm_normalizer(dm, x, y)
if applied is not None:
x, y = applied
else:
x = zscore_per_channel(x)
y = zscore_per_channel(y)
yp = model(x)
c = min(yp.shape[1], y.shape[1])
yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy()
vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5)
scores.append(vrmse)
result = {
"task": "acoustic_scattering_maze",
"dataset_repo": "polymathic-ai/acoustic_scattering_maze",
"model_family": "FNO",
"model_repo": "polymathic-ai/FNO-acoustic_scattering_maze",
"seed": 0,
"split": "test",
"horizons": [1],
"metrics": {
"VRMSE@1": {
"mean": float(np.mean(scores)),
"std": float(np.std(scores)),
"n": int(len(scores)),
}
},
}
with open(out_path, "w") as f:
json.dump(result, f, indent=2)
print("[ok] wrote", out_path)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main(out_path="submit.json")
2) eval_wdm_full_psamples.py (also logs per-sample VRMSE)
Use this one if you want per_sample_vrmse.jsonl to analyze outliers.
# eval_wdm_full_psamples.py
import json, warnings, random
import numpy as np
import torch
from torch.utils.data import DataLoader
from the_well.benchmark.models import FNO
from the_well.benchmark.train import WellDataModule
from the_well.data.normalization import ZScoreNormalization
random.seed(0); np.random.seed(0); torch.manual_seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
warnings.filterwarnings("ignore", category=UserWarning, module="tltorch")
warnings.filterwarnings("ignore", category=UserWarning, module="neuralop")
device = "cuda" if torch.cuda.is_available() else "cpu"
def to_bchw_take_t0_any(t):
if t.ndim == 5: t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous()
elif t.ndim == 4: t = t.permute(0, 3, 1, 2).contiguous()
elif t.ndim == 3: t = t.unsqueeze(1).contiguous()
elif t.ndim == 2: t = t[:, :, None, None].contiguous()
else: raise ValueError(f"rank no soportado: {t.ndim}")
return t.float()
def expand_scalar_plane(x, H, W):
if x is None: return None
if x.ndim == 2: x = x[:, 0]
elif x.ndim != 1: raise ValueError("esperaba (B,) o (B,T)")
x = x[:, None, None, None]
return x.expand(-1, 1, H, W).float()
def assemble_x_14ch(b):
parts = []
x_if = to_bchw_take_t0_any(b["input_fields"])
B, _, H, W = x_if.shape
parts.append(x_if)
if "space_grid" in b and torch.is_tensor(b["space_grid"]):
parts.append(to_bchw_take_t0_any(b["space_grid"]))
if "constant_fields" in b and torch.is_tensor(b["constant_fields"]):
parts.append(to_bchw_take_t0_any(b["constant_fields"]))
if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]):
bc = b["boundary_conditions"].view(B, -1)[:, :4]
bc = bc[:, :, None, None].expand(B, 4, H, W).float()
parts.append(bc)
it = b.get("input_time_grid", None); ot = b.get("output_time_grid", None)
if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W))
if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W))
x = torch.cat([p for p in parts if p is not None], dim=1)
if x.shape[1] < 14:
pad = torch.zeros(B, 14 - x.shape[1], H, W, dtype=x.dtype); x = torch.cat([x, pad], dim=1)
elif x.shape[1] > 14:
x = x[:, :14]
return x
def zscore_per_channel(x, eps=1e-6):
mu = x.mean(dim=(2,3), keepdim=True)
sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps)
return (x - mu) / sd
def try_apply_dm_normalizer(dm, x, y):
for name in ("normalization", "normalizer", "norm"):
norm = getattr(dm, name, None)
if norm is None: continue
if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"):
return norm.normalize_input(x), norm.normalize_output(y)
try: return norm(x, which="input"), norm(y, which="output")
except Exception: pass
try: return norm.forward(x, is_input=True), norm.forward(y, is_input=False)
except Exception: pass
return None
def main(out_path="submit.json", per_sample_path="per_sample_vrmse.jsonl"):
dm = WellDataModule(
well_base_path="hf://datasets/polymathic-ai/",
well_dataset_name="acoustic_scattering_maze",
batch_size=1,
data_workers=0,
use_normalization=True,
normalization_type=ZScoreNormalization,
n_steps_input=1, n_steps_output=1,
boundary_return_type="padding",
)
loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False)
model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval()
scores = []
with torch.no_grad(), open(per_sample_path, "w") as fout:
for b in loader:
x = assemble_x_14ch(b).to(device)
y = to_bchw_take_t0_any(b["output_fields"]).to(device)
applied = try_apply_dm_normalizer(dm, x, y)
if applied is not None: x, y = applied
else: x = zscore_per_channel(x); y = zscore_per_channel(y)
yp = model(x)
c = min(yp.shape[1], y.shape[1])
yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy()
vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5)
scores.append(vrmse)
fout.write(f"{vrmse}\n")
result = {
"task": "acoustic_scattering_maze",
"dataset_repo": "polymathic-ai/acoustic_scattering_maze",
"model_family": "FNO",
"model_repo": "polymathic-ai/FNO-acoustic_scattering_maze",
"seed": 0,
"split": "test",
"horizons": [1],
"metrics": {
"VRMSE@1": {"mean": float(np.mean(scores)), "std": float(np.std(scores)), "n": int(len(scores))}
},
}
with open(out_path, "w") as f:
json.dump(result, f, indent=2)
print("[ok] wrote", out_path, "and", per_sample_path)
if __name__ == "__main__":
main()
Run
python eval_wdm_full.py
# or
python eval_wdm_full_psamples.py
Artifacts:
submit.json— summary for the leaderboard- (optional)
per_sample_vrmse.jsonl— one number per line
Faster runs (optional)
Use local data instead of HF streaming
# Download only the needed split
the-well-download --base-path /data/the_well --dataset acoustic_scattering_maze --split test
Then in the script, set:
dm = WellDataModule(
well_base_path="/data/the_well", # <— local path now
well_dataset_name="acoustic_scattering_maze",
...
)
Increase throughput
- Try
batch_size=2..4anddata_workers=2..4if VRAM allows. - Keep seeds and determinism if you need strict reproducibility.
Docker (optional, for reproducibility)
Dockerfile:
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y git python3 python3-pip && rm -rf /var/lib/apt/lists/*
RUN python3 -m pip install --upgrade pip
# Install torch (adjust cu121 if needed)
RUN pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
# Install The Well (benchmark extras)
RUN git clone https://github.com/PolymathicAI/the_well.git && \
cd the_well && pip install -e ".[benchmark]"
WORKDIR /workspace
COPY eval_wdm_full.py /workspace/eval_wdm_full.py
CMD ["python", "eval_wdm_full.py"]
Run:
docker build -t well-eval .
docker run --gpus all -v $PWD:/workspace well-eval
Hugging Face Space (optional)
- Create a Docker Space and paste the Dockerfile above.
- Add your eval script(s).
- Optionally provide a minimal UI (Gradio/Streamlit) that triggers the script and displays the JSON.
Troubleshooting
ModuleNotFoundError: the_well.benchmark.data
You installed from PyPI. Install from GitHub withpip install -e ".[benchmark]".Huge VRMSE (e.g., 1e5+)
Input features not assembled/normalized like in training. Use the provided assembly to 14 channels, and ZScoreNormalization (either via the DM or per-channel fallback).No
setup()orprepare_data()onWellDataModule
Different versions expose different hooks. Usedm.test_dataloader()directly (as in the script).GPU / VRAM errors
Lowerbatch_sizeto 1; reduce workers; ensure Torch/CUDA versions match.Deprecation warnings from
timm,tltorch,neuralop
Cosmetic for evaluation; silenced in the script.
Contributing
- Add new tasks: duplicate
eval_wdm_full.pyand changewell_dataset_name. - Add new models: swap
FNO.from_pretrained(...)with other checkpoints. - Keep scripts stateless and deterministic for fair comparisons.
- Open a PR with your changes, include a sample
submit.json, and note CUDA/PyTorch versions used.
License
- Respect The Well’s license and any dataset/model licenses when sharing results.
- Include your project’s license here (e.g., MIT/Apache-2.0).