--- dataset_name: SAM-TP Traversability Dataset pretty_name: SAM-TP Traversability Dataset (Flattened) tasks: - image-segmentation - semantic-segmentation tags: - robotics - navigation - traversability - outdoor - sam2 - bev license: cc-by-4.0 annotations_creators: - machine-assisted - humans language: - en size_categories: - n<50K --- # SAM‑TP Traversability Dataset This repository contains pixel‑wise **traversability masks** paired with egocentric RGB images, prepared in a **flat, filename‑aligned** layout that is convenient for training SAM‑2 / SAM‑TP‑style segmentation models. > **Folder layout** ``` . ├─ images/ # RGB frames (.jpg/.png). Filenames are globally unique. ├─ annotations/ # Binary masks (.png/.jpg). Filenames match images 1‑to‑1. └─ manifest.csv # Provenance rows and any missing‑pair notes. ``` Each `annotations/` is the mask for `images/` (same filename, different folder). --- ## File naming Filenames are made globally unique by concatenating the original subfolder path and the local stem with `__` separators, e.g. ``` ride_68496_8ef98b_20240716023032_517__1.jpg ride_68496_8ef98b_20240716023032_517__1.png # corresponding mask ``` --- ## Mask format - Single‑channel binary masks; foreground = **traversable**, background = **non‑traversable**. - Stored as `.png` or `.jpg` depending on source. If your pipeline requires PNG, convert on the fly in your dataloader. - Values are typically `{0, 255}`. You can binarize via `mask = (mask > 127).astype(np.uint8)`. --- ## How to use ### A) Load with `datasets` (ImageFolder‑style) ```python from datasets import load_dataset from pathlib import Path from PIL import Image REPO = "jamiewjm/sam-tp" # e.g. "jamiewjm/sam-tp" ds_imgs = load_dataset( "imagefolder", data_dir=".", data_files={"image": f"hf://datasets/{REPO}/images/**"}, split="train", ) ds_msks = load_dataset( "imagefolder", data_dir=".", data_files={"mask": f"hf://datasets/{REPO}/annotations/**"}, split="train", ) # Build a mask index by filename mask_index = {Path(r["image"]["path"]).name: r["image"]["path"] for r in ds_msks} row = ds_imgs[0] img_path = Path(row["image"]["path"]) msk_path = Path(mask_index[img_path.name]) img = Image.open(img_path).convert("RGB") msk = Image.open(msk_path).convert("L") ``` ### B) Minimal PyTorch dataset ```python from pathlib import Path from PIL import Image from torch.utils.data import Dataset class TraversabilityDataset(Dataset): def __init__(self, root): root = Path(root) self.img_dir = root / "images" self.msk_dir = root / "annotations" self.items = sorted([p for p in self.img_dir.iterdir() if p.is_file()]) def __len__(self): return len(self.items) def __getitem__(self, idx): ip = self.items[idx] mp = self.msk_dir / ip.name return Image.open(ip).convert("RGB"), Image.open(mp).convert("L") ``` ### C) Pre‑processing notes for SAM‑2/SAM‑TP training - Resize/pad to your training resolution (commonly **1024×1024**) with masks aligned. - Normalize images per your backbone’s recipe. - If your trainer expects COCO‑RLE masks, convert PNG → RLE in the dataloader stage. --- ## Provenance & splits - The dataset was flattened from mirrored directory trees (images and annotations) with 1‑to‑1 filename alignment. - If you create explicit `train/val/test` splits, please add a `split` column to a copy of `manifest.csv` and contribute it back. --- ## License Data: **CC‑BY‑4.0** (Attribution). See `LICENSE` for details. --- ## Citation If you use this dataset in academic or industrial research, please cite the accompanying paper/report describing the data collection and labeling protocol. ``` @misc{sam_tp_dataset, title = {SAM‑TP Traversability Dataset}, howpublished = {Hugging Face Datasets}, year = {2025}, note = {URL: https://huggingface.co/datasets/jamiewjm/sam-tp} } ```