Demo_The_Well / README.md
napoles3d's picture
Update README.md
2c03391 verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: PDE Leaderboard (The Well)
emoji: 📈
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.47.1
app_file: app.py
pinned: false

The Well – PDE Baselines: Reproducible Evaluation Guide

A clean, GPU-friendly recipe to evaluate PDE surrogate models from The Well on the acoustic_scattering_maze task and generate a submit.json ready for a team leaderboard or Hugging Face Space.


TL;DR

# 0) create venv
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip

# 1) install PyTorch (adjust cu121 to your CUDA)
pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

# 2) get The Well (full repo with benchmark extras)
git clone https://github.com/PolymathicAI/the_well.git
cd the_well
pip install -e ".[benchmark]"
cd ..

# 3) place the evaluation script in this folder (eval_wdm_full.py)
python eval_wdm_full.py

# 4) see results
cat submit.json

What you get

  • A fully reproducible eval script (eval_wdm_full.py) that:

    • streams the dataset from HF (or reads it locally if you prefer),
    • assembles the 14-channel input feature tensor expected by the FNO baseline,
    • applies Z-Score normalization (same convention as the benchmark),
    • runs inference and produces a VRMSE@1 summary in submit.json.
  • An optional variant (eval_wdm_full_psamples.py) that also writes per-sample errors to per_sample_vrmse.jsonl.


Requirements

  • Linux (recommended) with CUDA GPU.
  • Python 3.10–3.12 recommended (3.13 can work, but some libs lag behind).
  • CUDA/cuDNN compatible with your installed PyTorch wheels (e.g., cu121).
  • Git; optionally Git LFS if you plan to clone large checkpoints.

Environment Setup

# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# upgrade pip
pip install --upgrade pip

# install torch + CUDA wheels (change cu121 if needed)
pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Install The Well (from source, with benchmark extras)

The PyPI package is trimmed; use the GitHub repo to get benchmark utilities.

git clone https://github.com/PolymathicAI/the_well.git
cd the_well
pip install -e ".[benchmark]"
cd ..

Quick sanity check:

python - <<'PY'
from the_well.benchmark.train import WellDataModule
from the_well.benchmark.models import FNO
print("OK: imports work")
PY

Scripts

Add these two files to your project folder (next to the the_well repo directory):

1) eval_wdm_full.py (clean, full test split)

# eval_wdm_full.py
import json, warnings, random
import numpy as np
import torch
from torch.utils.data import DataLoader
from the_well.benchmark.models import FNO
from the_well.benchmark.train import WellDataModule
from the_well.data.normalization import ZScoreNormalization

# Reproducibility
random.seed(0); np.random.seed(0); torch.manual_seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

# Quiet some noisy warnings
warnings.filterwarnings("ignore", category=UserWarning, module="tltorch")
warnings.filterwarnings("ignore", category=UserWarning, module="neuralop")

device = "cuda" if torch.cuda.is_available() else "cpu"

def to_bchw_take_t0_any(t: torch.Tensor) -> torch.Tensor:
    if t.ndim == 5:  # [B,T,H,W,C]
        t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous()
    elif t.ndim == 4:  # [B,H,W,C]
        t = t.permute(0, 3, 1, 2).contiguous()
    elif t.ndim == 3:  # [B,H,W]
        t = t.unsqueeze(1).contiguous()
    elif t.ndim == 2:  # [B,C] -> [B,C,1,1]
        t = t[:, :, None, None].contiguous()
    else:
        raise ValueError(f"Unsupported rank: {t.ndim}, shape={tuple(t.shape)}")
    return t.float()

def expand_scalar_plane(x: torch.Tensor, H: int, W: int):
    if x is None: return None
    if x.ndim == 2: x = x[:, 0]
    elif x.ndim != 1: raise ValueError("Expected (B,) or (B,T) for time scalars")
    x = x[:, None, None, None]
    return x.expand(-1, 1, H, W).float()

def assemble_x_14ch(b: dict) -> torch.Tensor:
    parts = []
    x_if = to_bchw_take_t0_any(b["input_fields"])   # ~3 channels
    B, _, H, W = x_if.shape
    parts.append(x_if)
    if "space_grid" in b and torch.is_tensor(b["space_grid"]):
        parts.append(to_bchw_take_t0_any(b["space_grid"]))          # +2
    if "constant_fields" in b and torch.is_tensor(b["constant_fields"]):
        parts.append(to_bchw_take_t0_any(b["constant_fields"]))     # +2
    if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]):
        bc = b["boundary_conditions"].view(B, -1)[:, :4]
        bc = bc[:, :, None, None].expand(B, 4, H, W).float()        # +4
        parts.append(bc)
    it = b.get("input_time_grid", None)
    ot = b.get("output_time_grid", None)
    if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W))  # +1
    if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W))  # +1
    x = torch.cat([p for p in parts if p is not None], dim=1)
    IN_CH = 14
    c = x.shape[1]
    if c < IN_CH:
        pad = torch.zeros(B, IN_CH - c, H, W, dtype=x.dtype)
        x = torch.cat([x, pad], dim=1)
    elif c > IN_CH:
        x = x[:, :IN_CH]
    return x

def zscore_per_channel(x: torch.Tensor, eps=1e-6) -> torch.Tensor:
    mu = x.mean(dim=(2,3), keepdim=True)
    sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps)
    return (x - mu) / sd

def try_apply_dm_normalizer(dm, x, y):
    for name in ("normalization", "normalizer", "norm"):
        norm = getattr(dm, name, None)
        if norm is None: 
            continue
        if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"):
            return norm.normalize_input(x), norm.normalize_output(y)
        try:
            return norm(x, which="input"), norm(y, which="output")
        except Exception:
            pass
        try:
            return norm.forward(x, is_input=True), norm.forward(y, is_input=False)
        except Exception:
            pass
    return None

def main(out_path="submit.json"):
    dm = WellDataModule(
        well_base_path="hf://datasets/polymathic-ai/",
        well_dataset_name="acoustic_scattering_maze",
        batch_size=1,
        data_workers=0,
        use_normalization=True,
        normalization_type=ZScoreNormalization,
        n_steps_input=1, n_steps_output=1,
        boundary_return_type="padding",
    )
    loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") \
             else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False)

    model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval()

    scores = []
    with torch.no_grad():
        for b in loader:
            x = assemble_x_14ch(b).to(device)
            y = to_bchw_take_t0_any(b["output_fields"]).to(device)

            applied = try_apply_dm_normalizer(dm, x, y)
            if applied is not None:
                x, y = applied
            else:
                x = zscore_per_channel(x)
                y = zscore_per_channel(y)

            yp = model(x)
            c = min(yp.shape[1], y.shape[1])
            yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy()
            vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5)
            scores.append(vrmse)

    result = {
        "task": "acoustic_scattering_maze",
        "dataset_repo": "polymathic-ai/acoustic_scattering_maze",
        "model_family": "FNO",
        "model_repo": "polymathic-ai/FNO-acoustic_scattering_maze",
        "seed": 0,
        "split": "test",
        "horizons": [1],
        "metrics": {
            "VRMSE@1": {
                "mean": float(np.mean(scores)),
                "std": float(np.std(scores)),
                "n": int(len(scores)),
            }
        },
    }
    with open(out_path, "w") as f:
        json.dump(result, f, indent=2)
    print("[ok] wrote", out_path)
    print(json.dumps(result, indent=2))

if __name__ == "__main__":
    main(out_path="submit.json")

2) eval_wdm_full_psamples.py (also logs per-sample VRMSE)

Use this one if you want per_sample_vrmse.jsonl to analyze outliers.

# eval_wdm_full_psamples.py
import json, warnings, random
import numpy as np
import torch
from torch.utils.data import DataLoader
from the_well.benchmark.models import FNO
from the_well.benchmark.train import WellDataModule
from the_well.data.normalization import ZScoreNormalization

random.seed(0); np.random.seed(0); torch.manual_seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
warnings.filterwarnings("ignore", category=UserWarning, module="tltorch")
warnings.filterwarnings("ignore", category=UserWarning, module="neuralop")
device = "cuda" if torch.cuda.is_available() else "cpu"

def to_bchw_take_t0_any(t):
    if t.ndim == 5: t = t[:, 0, ...].permute(0, 3, 1, 2).contiguous()
    elif t.ndim == 4: t = t.permute(0, 3, 1, 2).contiguous()
    elif t.ndim == 3: t = t.unsqueeze(1).contiguous()
    elif t.ndim == 2: t = t[:, :, None, None].contiguous()
    else: raise ValueError(f"rank no soportado: {t.ndim}")
    return t.float()

def expand_scalar_plane(x, H, W):
    if x is None: return None
    if x.ndim == 2: x = x[:, 0]
    elif x.ndim != 1: raise ValueError("esperaba (B,) o (B,T)")
    x = x[:, None, None, None]
    return x.expand(-1, 1, H, W).float()

def assemble_x_14ch(b):
    parts = []
    x_if = to_bchw_take_t0_any(b["input_fields"])
    B, _, H, W = x_if.shape
    parts.append(x_if)
    if "space_grid" in b and torch.is_tensor(b["space_grid"]):
        parts.append(to_bchw_take_t0_any(b["space_grid"]))
    if "constant_fields" in b and torch.is_tensor(b["constant_fields"]):
        parts.append(to_bchw_take_t0_any(b["constant_fields"]))
    if "boundary_conditions" in b and torch.is_tensor(b["boundary_conditions"]):
        bc = b["boundary_conditions"].view(B, -1)[:, :4]
        bc = bc[:, :, None, None].expand(B, 4, H, W).float()
        parts.append(bc)
    it = b.get("input_time_grid", None); ot = b.get("output_time_grid", None)
    if torch.is_tensor(it): parts.append(expand_scalar_plane(it, H, W))
    if torch.is_tensor(ot): parts.append(expand_scalar_plane(ot, H, W))
    x = torch.cat([p for p in parts if p is not None], dim=1)
    if x.shape[1] < 14:
        pad = torch.zeros(B, 14 - x.shape[1], H, W, dtype=x.dtype); x = torch.cat([x, pad], dim=1)
    elif x.shape[1] > 14:
        x = x[:, :14]
    return x

def zscore_per_channel(x, eps=1e-6):
    mu = x.mean(dim=(2,3), keepdim=True)
    sd = x.std(dim=(2,3), keepdim=True).clamp_min(eps)
    return (x - mu) / sd

def try_apply_dm_normalizer(dm, x, y):
    for name in ("normalization", "normalizer", "norm"):
        norm = getattr(dm, name, None)
        if norm is None: continue
        if hasattr(norm, "normalize_input") and hasattr(norm, "normalize_output"):
            return norm.normalize_input(x), norm.normalize_output(y)
        try: return norm(x, which="input"), norm(y, which="output")
        except Exception: pass
        try: return norm.forward(x, is_input=True), norm.forward(y, is_input=False)
        except Exception: pass
    return None

def main(out_path="submit.json", per_sample_path="per_sample_vrmse.jsonl"):
    dm = WellDataModule(
        well_base_path="hf://datasets/polymathic-ai/",
        well_dataset_name="acoustic_scattering_maze",
        batch_size=1,
        data_workers=0,
        use_normalization=True,
        normalization_type=ZScoreNormalization,
        n_steps_input=1, n_steps_output=1,
        boundary_return_type="padding",
    )
    loader = dm.test_dataloader() if hasattr(dm, "test_dataloader") else DataLoader(getattr(dm, "test"), batch_size=1, shuffle=False)
    model = FNO.from_pretrained("polymathic-ai/FNO-acoustic_scattering_maze").to(device).eval()

    scores = []
    with torch.no_grad(), open(per_sample_path, "w") as fout:
        for b in loader:
            x = assemble_x_14ch(b).to(device)
            y = to_bchw_take_t0_any(b["output_fields"]).to(device)

            applied = try_apply_dm_normalizer(dm, x, y)
            if applied is not None: x, y = applied
            else: x = zscore_per_channel(x); y = zscore_per_channel(y)

            yp = model(x)
            c = min(yp.shape[1], y.shape[1])
            yp_np, y_np = yp[:, :c].detach().cpu().numpy(), y[:, :c].detach().cpu().numpy()
            vrmse = float((((yp_np - y_np)**2).mean() / (y_np.var() + 1e-12))**0.5)
            scores.append(vrmse)
            fout.write(f"{vrmse}\n")

    result = {
        "task": "acoustic_scattering_maze",
        "dataset_repo": "polymathic-ai/acoustic_scattering_maze",
        "model_family": "FNO",
        "model_repo": "polymathic-ai/FNO-acoustic_scattering_maze",
        "seed": 0,
        "split": "test",
        "horizons": [1],
        "metrics": {
            "VRMSE@1": {"mean": float(np.mean(scores)), "std": float(np.std(scores)), "n": int(len(scores))}
        },
    }
    with open(out_path, "w") as f:
        json.dump(result, f, indent=2)
    print("[ok] wrote", out_path, "and", per_sample_path)

if __name__ == "__main__":
    main()

Run

python eval_wdm_full.py
# or
python eval_wdm_full_psamples.py

Artifacts:

  • submit.json — summary for the leaderboard
  • (optional) per_sample_vrmse.jsonl — one number per line

Faster runs (optional)

Use local data instead of HF streaming

# Download only the needed split
the-well-download --base-path /data/the_well --dataset acoustic_scattering_maze --split test

Then in the script, set:

dm = WellDataModule(
  well_base_path="/data/the_well",   # <— local path now
  well_dataset_name="acoustic_scattering_maze",
  ...
)

Increase throughput

  • Try batch_size=2..4 and data_workers=2..4 if VRAM allows.
  • Keep seeds and determinism if you need strict reproducibility.

Docker (optional, for reproducibility)

Dockerfile:

FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y git python3 python3-pip && rm -rf /var/lib/apt/lists/*
RUN python3 -m pip install --upgrade pip

# Install torch (adjust cu121 if needed)
RUN pip install --extra-index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

# Install The Well (benchmark extras)
RUN git clone https://github.com/PolymathicAI/the_well.git && \
    cd the_well && pip install -e ".[benchmark]"

WORKDIR /workspace
COPY eval_wdm_full.py /workspace/eval_wdm_full.py

CMD ["python", "eval_wdm_full.py"]

Run:

docker build -t well-eval .
docker run --gpus all -v $PWD:/workspace well-eval

Hugging Face Space (optional)

  • Create a Docker Space and paste the Dockerfile above.
  • Add your eval script(s).
  • Optionally provide a minimal UI (Gradio/Streamlit) that triggers the script and displays the JSON.

Troubleshooting

  • ModuleNotFoundError: the_well.benchmark.data
    You installed from PyPI. Install from GitHub with pip install -e ".[benchmark]".

  • Huge VRMSE (e.g., 1e5+)
    Input features not assembled/normalized like in training. Use the provided assembly to 14 channels, and ZScoreNormalization (either via the DM or per-channel fallback).

  • No setup() or prepare_data() on WellDataModule
    Different versions expose different hooks. Use dm.test_dataloader() directly (as in the script).

  • GPU / VRAM errors
    Lower batch_size to 1; reduce workers; ensure Torch/CUDA versions match.

  • Deprecation warnings from timm, tltorch, neuralop
    Cosmetic for evaluation; silenced in the script.


Contributing

  • Add new tasks: duplicate eval_wdm_full.py and change well_dataset_name.
  • Add new models: swap FNO.from_pretrained(...) with other checkpoints.
  • Keep scripts stateless and deterministic for fair comparisons.
  • Open a PR with your changes, include a sample submit.json, and note CUDA/PyTorch versions used.

License

  • Respect The Well’s license and any dataset/model licenses when sharing results.
  • Include your project’s license here (e.g., MIT/Apache-2.0).