Spaces:
Sleeping
Sleeping
| title: PDE Leaderboard (The Well) | |
| emoji: 📈 | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.47.1 | |
| app_file: app.py | |
| pinned: false | |
| # The Well – PDE Baselines: Reproducible Evaluation Guide | |
| A clean, GPU-friendly recipe to evaluate **PDE surrogate models** from **The Well** on the **`acoustic_scattering_maze`** task and generate a `submit.json` ready for a team leaderboard or Hugging Face Space. | |
| --- | |
| ## TL;DR | |
| ```bash | |
| # 0) create venv | |
| python -m venv .venv && source .venv/bin/activate | |
| pip install --upgrade pip | |
| # 1) install PyTorch (adjust cu121 to your CUDA) | |
| pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio | |
| # 2) get The Well (full repo with benchmark extras) | |
| git clone https://github.com/PolymathicAI/the_well.git | |
| cd the_well | |
| pip install -e ".[benchmark]" | |
| cd .. | |
| # 3) place the evaluation script in this folder (eval_wdm_full.py) | |
| python eval_wdm_full.py | |
| # 4) see results | |
| cat submit.json | |
| ``` | |
| --- | |
| ## What you get | |
| - A **fully reproducible** eval script (`eval_wdm_full.py`) that: | |
| - streams the dataset from HF (or reads it locally if you prefer), | |
| - assembles the **14-channel** input feature tensor expected by the FNO baseline, | |
| - applies **Z-Score normalization** (same convention as the benchmark), | |
| - runs inference and produces a **VRMSE@1** summary in `submit.json`. | |
| - An optional variant (`eval_wdm_full_psamples.py`) that also writes per-sample errors to `per_sample_vrmse.jsonl`. | |
| --- | |
| ## Requirements | |
| - Linux (recommended) with CUDA GPU. | |
| - **Python 3.10–3.12** recommended (3.13 can work, but some libs lag behind). | |
| - **CUDA/cuDNN** compatible with your installed PyTorch wheels (e.g., `cu121`). | |
| - Git; optionally Git LFS if you plan to clone large checkpoints. | |
| --- | |
| ## Environment Setup | |
| ```bash | |
| # create and activate a virtual environment | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| # upgrade pip | |
| pip install --upgrade pip | |
| # install torch + CUDA wheels (change cu121 if needed) | |
| pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio | |
| ``` | |
| --- | |
| ## Install The Well (from source, with benchmark extras) | |
| > The PyPI package is trimmed; **use the GitHub repo** to get benchmark utilities. | |
| ```bash | |
| git clone https://github.com/PolymathicAI/the_well.git | |
| cd the_well | |
| pip install -e ".[benchmark]" | |
| cd .. | |
| ``` | |
| Quick sanity check: | |
| ```bash | |
| python - <<'PY' | |
| from the_well.benchmark.train import WellDataModule | |
| from the_well.benchmark.models import FNO | |
| print("OK: imports work") | |
| PY | |
| ``` | |
| --- | |
| ## Scripts | |
| Add these two files to your project folder (next to the `the_well` repo directory): | |
| ### 1) `eval_wdm_full.py` (clean, full test split) | |
| ```python | |
| # eval_wdm_full.py | |
| import json, warnings, random | |
| import numpy as np | |
| import torch | |
| from torch.utils.data import DataLoader | |
| from the_well.benchmark.models import FNO | |
| from the_well.benchmark.train import WellDataModule | |
| from the_well.data.normalization import ZScoreNormalization | |
| # Reproducibility | |
| random.seed(0); np.random.seed(0); torch.manual_seed(0) | |
| torch.backends.cudnn.benchmark = False | |
| torch.backends.cudnn.deterministic = True | |
| # Quiet some noisy warnings | |
| warnings.filterwarnings("ignore", category=UserWarning, module="tltorch") | |
| warnings.filterwarnings("ignore", category=UserWarning, module="neuralop") | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| def to_bchw_take_t0_any(t: torch.Tensor) -> torch.Tensor: | |
| if t.ndim == 5: # [B,T,H,W,C] | |
| t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous() | |
| elif t.ndim == 4: # [B,H,W,C] | |
| t = t.permute(0, 3, 1, 2).contiguous() | |
| elif t.ndim == 3: # [B,H,W] | |
| t = t.unsqueeze(1).contiguous() | |
| elif t.ndim == 2: # [B,C] -> [B,C,1,1] | |
| t = t[:, :, None, None].contiguous() | |
| else: | |
| raise ValueError(f"Unsupported rank: {t.ndim}, shape={tuple(t.shape)}") | |
| return t.float() | |
| def expand_scalar_plane(x: torch.Tensor, H: int, W: int): | |
| if x is None: return None | |
| if x.ndim == 2: x = x[:, 0] | |
| elif x.ndim != 1: raise ValueError("Expected (B,) or (B,T) for time scalars") | |
| x = x[:, None, None, None] | |
| return x.expand(-1, 1, H, W).float() | |
| def assemble_x_14ch(b: dict) -> torch.Tensor: | |
| parts = [] | |
| x_if = to_bchw_take_t0_any(b["input_fields"]) # ~3 channels | |
| B, _, H, W = x_if.shape | |
| parts.append(x_if) | |
| if "space_grid" in b and torch.is_tensor(b["space_grid"]): | |
| parts.append(to_bchw_take_t0_any(b["space_grid"])) # +2 | |
| if "constant_fields" in b and torch.is_tensor(b["constant_fields"]): | |
| parts.append(to_bchw_take_t0_any(b["constant_fields"])) # +2 | |
| if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]): | |
| bc = b["boundary_conditions"].view(B, -1)[:, :4] | |
| bc = bc[:, :, None, None].expand(B, 4, H, W).float() # +4 | |
| parts.append(bc) | |
| it = b.get("input_time_grid", None) | |
| ot = b.get("output_time_grid", None) | |
| if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W)) # +1 | |
| if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W)) # +1 | |
| x = torch.cat([p for p in parts if p is not None], dim=1) | |
| IN_CH = 14 | |
| c = x.shape[1] | |
| if c < IN_CH: | |
| pad = torch.zeros(B, IN_CH - c, H, W, dtype=x.dtype) | |
| x = torch.cat([x, pad], dim=1) | |
| elif c > IN_CH: | |
| x = x[:, :IN_CH] | |
| return x | |
| def zscore_per_channel(x: torch.Tensor, eps=1e-6) -> torch.Tensor: | |
| mu = x.mean(dim=(2,3), keepdim=True) | |
| sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps) | |
| return (x - mu) / sd | |
| def try_apply_dm_normalizer(dm, x, y): | |
| for name in ("normalization", "normalizer", "norm"): | |
| norm = getattr(dm, name, None) | |
| if norm is None: | |
| continue | |
| if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"): | |
| return norm.normalize_input(x), norm.normalize_output(y) | |
| try: | |
| return norm(x, which="input"), norm(y, which="output") | |
| except Exception: | |
| pass | |
| try: | |
| return norm.forward(x, is_input=True), norm.forward(y, is_input=False) | |
| except Exception: | |
| pass | |
| return None | |
| def main(out_path="submit.json"): | |
| dm = WellDataModule( | |
| well_base_path="hf://datasets/polymathic-ai/", | |
| well_dataset_name="acoustic_scattering_maze", | |
| batch_size=1, | |
| data_workers=0, | |
| use_normalization=True, | |
| normalization_type=ZScoreNormalization, | |
| n_steps_input=1, n_steps_output=1, | |
| boundary_return_type="padding", | |
| ) | |
| loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") \ | |
| else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False) | |
| model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval() | |
| scores = [] | |
| with torch.no_grad(): | |
| for b in loader: | |
| x = assemble_x_14ch(b).to(device) | |
| y = to_bchw_take_t0_any(b["output_fields"]).to(device) | |
| applied = try_apply_dm_normalizer(dm, x, y) | |
| if applied is not None: | |
| x, y = applied | |
| else: | |
| x = zscore_per_channel(x) | |
| y = zscore_per_channel(y) | |
| yp = model(x) | |
| c = min(yp.shape[1], y.shape[1]) | |
| yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy() | |
| vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5) | |
| scores.append(vrmse) | |
| result = { | |
| "task": "acoustic_scattering_maze", | |
| "dataset_repo": "polymathic-ai/acoustic_scattering_maze", | |
| "model_family": "FNO", | |
| "model_repo": "polymathic-ai/FNO-acoustic_scattering_maze", | |
| "seed": 0, | |
| "split": "test", | |
| "horizons": [1], | |
| "metrics": { | |
| "VRMSE@1": { | |
| "mean": float(np.mean(scores)), | |
| "std": float(np.std(scores)), | |
| "n": int(len(scores)), | |
| } | |
| }, | |
| } | |
| with open(out_path, "w") as f: | |
| json.dump(result, f, indent=2) | |
| print("[ok] wrote", out_path) | |
| print(json.dumps(result, indent=2)) | |
| if __name__ == "__main__": | |
| main(out_path="submit.json") | |
| ``` | |
| ### 2) `eval_wdm_full_psamples.py` (also logs per-sample VRMSE) | |
| Use this one if you want `per_sample_vrmse.jsonl` to analyze outliers. | |
| ```python | |
| # eval_wdm_full_psamples.py | |
| import json, warnings, random | |
| import numpy as np | |
| import torch | |
| from torch.utils.data import DataLoader | |
| from the_well.benchmark.models import FNO | |
| from the_well.benchmark.train import WellDataModule | |
| from the_well.data.normalization import ZScoreNormalization | |
| random.seed(0); np.random.seed(0); torch.manual_seed(0) | |
| torch.backends.cudnn.benchmark = False | |
| torch.backends.cudnn.deterministic = True | |
| warnings.filterwarnings("ignore", category=UserWarning, module="tltorch") | |
| warnings.filterwarnings("ignore", category=UserWarning, module="neuralop") | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| def to_bchw_take_t0_any(t): | |
| if t.ndim == 5: t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous() | |
| elif t.ndim == 4: t = t.permute(0, 3, 1, 2).contiguous() | |
| elif t.ndim == 3: t = t.unsqueeze(1).contiguous() | |
| elif t.ndim == 2: t = t[:, :, None, None].contiguous() | |
| else: raise ValueError(f"rank no soportado: {t.ndim}") | |
| return t.float() | |
| def expand_scalar_plane(x, H, W): | |
| if x is None: return None | |
| if x.ndim == 2: x = x[:, 0] | |
| elif x.ndim != 1: raise ValueError("esperaba (B,) o (B,T)") | |
| x = x[:, None, None, None] | |
| return x.expand(-1, 1, H, W).float() | |
| def assemble_x_14ch(b): | |
| parts = [] | |
| x_if = to_bchw_take_t0_any(b["input_fields"]) | |
| B, _, H, W = x_if.shape | |
| parts.append(x_if) | |
| if "space_grid" in b and torch.is_tensor(b["space_grid"]): | |
| parts.append(to_bchw_take_t0_any(b["space_grid"])) | |
| if "constant_fields" in b and torch.is_tensor(b["constant_fields"]): | |
| parts.append(to_bchw_take_t0_any(b["constant_fields"])) | |
| if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]): | |
| bc = b["boundary_conditions"].view(B, -1)[:, :4] | |
| bc = bc[:, :, None, None].expand(B, 4, H, W).float() | |
| parts.append(bc) | |
| it = b.get("input_time_grid", None); ot = b.get("output_time_grid", None) | |
| if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W)) | |
| if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W)) | |
| x = torch.cat([p for p in parts if p is not None], dim=1) | |
| if x.shape[1] < 14: | |
| pad = torch.zeros(B, 14 - x.shape[1], H, W, dtype=x.dtype); x = torch.cat([x, pad], dim=1) | |
| elif x.shape[1] > 14: | |
| x = x[:, :14] | |
| return x | |
| def zscore_per_channel(x, eps=1e-6): | |
| mu = x.mean(dim=(2,3), keepdim=True) | |
| sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps) | |
| return (x - mu) / sd | |
| def try_apply_dm_normalizer(dm, x, y): | |
| for name in ("normalization", "normalizer", "norm"): | |
| norm = getattr(dm, name, None) | |
| if norm is None: continue | |
| if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"): | |
| return norm.normalize_input(x), norm.normalize_output(y) | |
| try: return norm(x, which="input"), norm(y, which="output") | |
| except Exception: pass | |
| try: return norm.forward(x, is_input=True), norm.forward(y, is_input=False) | |
| except Exception: pass | |
| return None | |
| def main(out_path="submit.json", per_sample_path="per_sample_vrmse.jsonl"): | |
| dm = WellDataModule( | |
| well_base_path="hf://datasets/polymathic-ai/", | |
| well_dataset_name="acoustic_scattering_maze", | |
| batch_size=1, | |
| data_workers=0, | |
| use_normalization=True, | |
| normalization_type=ZScoreNormalization, | |
| n_steps_input=1, n_steps_output=1, | |
| boundary_return_type="padding", | |
| ) | |
| loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False) | |
| model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval() | |
| scores = [] | |
| with torch.no_grad(), open(per_sample_path, "w") as fout: | |
| for b in loader: | |
| x = assemble_x_14ch(b).to(device) | |
| y = to_bchw_take_t0_any(b["output_fields"]).to(device) | |
| applied = try_apply_dm_normalizer(dm, x, y) | |
| if applied is not None: x, y = applied | |
| else: x = zscore_per_channel(x); y = zscore_per_channel(y) | |
| yp = model(x) | |
| c = min(yp.shape[1], y.shape[1]) | |
| yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy() | |
| vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5) | |
| scores.append(vrmse) | |
| fout.write(f"{vrmse}\n") | |
| result = { | |
| "task": "acoustic_scattering_maze", | |
| "dataset_repo": "polymathic-ai/acoustic_scattering_maze", | |
| "model_family": "FNO", | |
| "model_repo": "polymathic-ai/FNO-acoustic_scattering_maze", | |
| "seed": 0, | |
| "split": "test", | |
| "horizons": [1], | |
| "metrics": { | |
| "VRMSE@1": {"mean": float(np.mean(scores)), "std": float(np.std(scores)), "n": int(len(scores))} | |
| }, | |
| } | |
| with open(out_path, "w") as f: | |
| json.dump(result, f, indent=2) | |
| print("[ok] wrote", out_path, "and", per_sample_path) | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| --- | |
| ## Run | |
| ```bash | |
| python eval_wdm_full.py | |
| # or | |
| python eval_wdm_full_psamples.py | |
| ``` | |
| Artifacts: | |
| - `submit.json` — summary for the leaderboard | |
| - (optional) `per_sample_vrmse.jsonl` — one number per line | |
| --- | |
| ## Faster runs (optional) | |
| ### Use local data instead of HF streaming | |
| ```bash | |
| # Download only the needed split | |
| the-well-download --base-path /data/the_well --dataset acoustic_scattering_maze --split test | |
| ``` | |
| Then in the script, set: | |
| ```python | |
| dm = WellDataModule( | |
| well_base_path="/data/the_well", # <— local path now | |
| well_dataset_name="acoustic_scattering_maze", | |
| ... | |
| ) | |
| ``` | |
| ### Increase throughput | |
| - Try `batch_size=2..4` and `data_workers=2..4` if VRAM allows. | |
| - Keep seeds and determinism if you need strict reproducibility. | |
| --- | |
| ## Docker (optional, for reproducibility) | |
| **Dockerfile**: | |
| ```dockerfile | |
| FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 | |
| RUN apt-get update && apt-get install -y git python3 python3-pip && rm -rf /var/lib/apt/lists/* | |
| RUN python3 -m pip install --upgrade pip | |
| # Install torch (adjust cu121 if needed) | |
| RUN pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio | |
| # Install The Well (benchmark extras) | |
| RUN git clone https://github.com/PolymathicAI/the_well.git && \ | |
| cd the_well && pip install -e ".[benchmark]" | |
| WORKDIR /workspace | |
| COPY eval_wdm_full.py /workspace/eval_wdm_full.py | |
| CMD ["python", "eval_wdm_full.py"] | |
| ``` | |
| Run: | |
| ```bash | |
| docker build -t well-eval . | |
| docker run --gpus all -v $PWD:/workspace well-eval | |
| ``` | |
| --- | |
| ## Hugging Face Space (optional) | |
| - Create a **Docker** Space and paste the Dockerfile above. | |
| - Add your eval script(s). | |
| - Optionally provide a minimal UI (Gradio/Streamlit) that triggers the script and displays the JSON. | |
| --- | |
| ## Troubleshooting | |
| - **`ModuleNotFoundError: the_well.benchmark.data`** | |
| You installed from PyPI. Install from GitHub with `pip install -e ".[benchmark]"`. | |
| - **Huge VRMSE (e.g., 1e5+)** | |
| Input features not assembled/normalized like in training. Use the provided assembly to 14 channels, and **ZScoreNormalization** (either via the DM or per-channel fallback). | |
| - **No `setup()` or `prepare_data()` on `WellDataModule`** | |
| Different versions expose different hooks. Use `dm.test_dataloader()` directly (as in the script). | |
| - **GPU / VRAM errors** | |
| Lower `batch_size` to 1; reduce workers; ensure Torch/CUDA versions match. | |
| - **Deprecation warnings from `timm`, `tltorch`, `neuralop`** | |
| Cosmetic for evaluation; silenced in the script. | |
| --- | |
| ## Contributing | |
| - Add new tasks: duplicate `eval_wdm_full.py` and change `well_dataset_name`. | |
| - Add new models: swap `FNO.from_pretrained(...)` with other checkpoints. | |
| - Keep scripts **stateless** and **deterministic** for fair comparisons. | |
| - Open a PR with your changes, include a sample `submit.json`, and note CUDA/PyTorch versions used. | |
| --- | |
| ## License | |
| - Respect The Well’s license and any dataset/model licenses when sharing results. | |
| - Include your project’s license here (e.g., MIT/Apache-2.0). |