---
dataset_name: SAM-TP Traversability Dataset
pretty_name: SAM-TP Traversability Dataset (Flattened)
tasks:
- image-segmentation
- semantic-segmentation
tags:
- robotics
- navigation
- traversability
- outdoor
- sam2
- bev
license: cc-by-4.0
annotations_creators:
- machine-assisted
- humans
language:
- en
size_categories:
- n<50K
---

# SAM‑TP Traversability Dataset

This repository contains pixel‑wise **traversability masks** paired with egocentric RGB images, prepared in a **flat, filename‑aligned** layout that is convenient for training SAM‑2 / SAM‑TP‑style segmentation models.

> **Folder layout**
```
.
├─ images/          # RGB frames (.jpg/.png). Filenames are globally unique.
├─ annotations/     # Binary masks (.png/.jpg). Filenames match images 1‑to‑1.
└─ manifest.csv     # Provenance rows and any missing‑pair notes.
```
Each `annotations/<FILENAME>` is the mask for `images/<FILENAME>` (same filename, different folder).

---

## File naming

Filenames are made globally unique by concatenating the original subfolder path and the local stem with `__` separators, e.g.
```
ride_68496_8ef98b_20240716023032_517__1.jpg
ride_68496_8ef98b_20240716023032_517__1.png  # corresponding mask
```

---

## Mask format

- Single‑channel binary masks; foreground = **traversable**, background = **non‑traversable**.
- Stored as `.png` or `.jpg` depending on source. If your pipeline requires PNG, convert on the fly in your dataloader.
- Values are typically `{0, 255}`. You can binarize via `mask = (mask > 127).astype(np.uint8)`.

---

## How to use

### A) Load with `datasets` (ImageFolder‑style)

```python
from datasets import load_dataset
from pathlib import Path
from PIL import Image

REPO = "jamiewjm/sam-tp"  # e.g. "jamiewjm/sam-tp"

ds_imgs = load_dataset(
    "imagefolder",
    data_dir=".",
    data_files={"image": f"hf://datasets/{REPO}/images/**"},
    split="train",
)
ds_msks = load_dataset(
    "imagefolder",
    data_dir=".",
    data_files={"mask": f"hf://datasets/{REPO}/annotations/**"},
    split="train",
)

# Build a mask index by filename
mask_index = {Path(r["image"]["path"]).name: r["image"]["path"] for r in ds_msks}

row = ds_imgs[0]
img_path = Path(row["image"]["path"])
msk_path = Path(mask_index[img_path.name])

img = Image.open(img_path).convert("RGB")
msk = Image.open(msk_path).convert("L")
```

### B) Minimal PyTorch dataset

```python
from pathlib import Path
from PIL import Image
from torch.utils.data import Dataset

class TraversabilityDataset(Dataset):
    def __init__(self, root):
        root = Path(root)
        self.img_dir = root / "images"
        self.msk_dir = root / "annotations"
        self.items = sorted([p for p in self.img_dir.iterdir() if p.is_file()])
    def __len__(self):
        return len(self.items)
    def __getitem__(self, idx):
        ip = self.items[idx]
        mp = self.msk_dir / ip.name
        return Image.open(ip).convert("RGB"), Image.open(mp).convert("L")
```

### C) Pre‑processing notes for SAM‑2/SAM‑TP training

- Resize/pad to your training resolution (commonly **1024×1024**) with masks aligned.
- Normalize images per your backbone’s recipe.
- If your trainer expects COCO‑RLE masks, convert PNG → RLE in the dataloader stage.

---

## Provenance & splits

- The dataset was flattened from mirrored directory trees (images and annotations) with 1‑to‑1 filename alignment.
- If you create explicit `train/val/test` splits, please add a `split` column to a copy of `manifest.csv` and contribute it back.

---

## License

Data: **CC‑BY‑4.0** (Attribution). See `LICENSE` for details.

---

## Citation

If you use this dataset in academic or industrial research, please cite the accompanying paper/report describing the data collection and labeling protocol.

```
@misc{sam_tp_dataset,
  title        = {SAM‑TP Traversability Dataset},
  howpublished = {Hugging Face Datasets},
  year         = {2025},
  note         = {URL: https://huggingface.co/datasets/jamiewjm/sam-tp}
}
```