Upload 2 files

Browse files

Files changed (2) hide show

LICENSE +2 -0
README.md +160 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Creative Commons Attribution 4.0 International (CC BY 4.0)
2	+ https://creativecommons.org/licenses/by/4.0/

README.md ADDED Viewed

	@@ -0,0 +1,160 @@

+---
+dataset_name: SAM-TP Traversability Dataset
+pretty_name: SAM-TP Traversability Dataset (Flattened)
+tasks:
+- image-segmentation
+- semantic-segmentation
+tags:
+- robotics
+- navigation
+- traversability
+- outdoor
+- sam2
+- bev
+license: cc-by-4.0
+annotations_creators:
+- machine-assisted
+- humans
+language:
+- en
+size_categories:
+- n<50K
+---
+# SAM‑TP Traversability Dataset
+This repository contains pixel‑wise **traversability masks** paired with egocentric RGB images, prepared in a **flat, filename‑aligned** layout that is convenient for training SAM‑2 / SAM‑TP‑style segmentation models.
+> **Folder layout**
+```
+.
+├─ images/          # RGB frames (.jpg/.png). Filenames are globally unique.
+├─ annotations/     # Binary masks (.png/.jpg). Filenames match images 1‑to‑1.
+└─ manifest.csv     # Provenance rows and any missing‑pair notes.
+```
+Each `annotations/<FILENAME>` is the mask for `images/<FILENAME>` (same filename, different folder).
+---
+## File naming
+Filenames are made globally unique by concatenating the original subfolder path and the local stem with `__` separators, e.g.
+```
+ride_68496_8ef98b_20240716023032_517__1.jpg
+ride_68496_8ef98b_20240716023032_517__1.png  # corresponding mask
+```
+---
+## Mask format
+- Single‑channel binary masks; foreground = **traversable**, background = **non‑traversable**.
+- Stored as `.png` or `.jpg` depending on source. If your pipeline requires PNG, convert on the fly in your dataloader.
+- Values are typically `{0, 255}`. You can binarize via `mask = (mask > 127).astype(np.uint8)`.
+---
+## How to use
+### A) Load with `datasets` (ImageFolder‑style)
+```python
+from datasets import load_dataset
+from pathlib import Path
+from PIL import Image
+REPO = "jamiewjm/sam-tp"  # e.g. "jamiewjm/sam-tp"
+ds_imgs = load_dataset(
+    "imagefolder",
+    data_dir=".",
+    data_files={"image": f"hf://datasets/{REPO}/images/**"},
+    split="train",
+)
+ds_msks = load_dataset(
+    "imagefolder",
+    data_dir=".",
+    data_files={"mask": f"hf://datasets/{REPO}/annotations/**"},
+    split="train",
+)
+# Build a mask index by filename
+mask_index = {Path(r["image"]["path"]).name: r["image"]["path"] for r in ds_msks}
+row = ds_imgs[0]
+img_path = Path(row["image"]["path"])
+msk_path = Path(mask_index[img_path.name])
+img = Image.open(img_path).convert("RGB")
+msk = Image.open(msk_path).convert("L")
+```
+### B) Minimal PyTorch dataset
+```python
+from pathlib import Path
+from PIL import Image
+from torch.utils.data import Dataset
+class TraversabilityDataset(Dataset):
+    def __init__(self, root):
+        root = Path(root)
+        self.img_dir = root / "images"
+        self.msk_dir = root / "annotations"
+        self.items = sorted([p for p in self.img_dir.iterdir() if p.is_file()])
+    def __len__(self):
+        return len(self.items)
+    def __getitem__(self, idx):
+        ip = self.items[idx]
+        mp = self.msk_dir / ip.name
+        return Image.open(ip).convert("RGB"), Image.open(mp).convert("L")
+```
+### C) Pre‑processing notes for SAM‑2/SAM‑TP training
+- Resize/pad to your training resolution (commonly **1024×1024**) with masks aligned.
+- Normalize images per your backbone’s recipe.
+- If your trainer expects COCO‑RLE masks, convert PNG → RLE in the dataloader stage.
+---
+## Provenance & splits
+- The dataset was flattened from mirrored directory trees (images and annotations) with 1‑to‑1 filename alignment.
+- If you create explicit `train/val/test` splits, please add a `split` column to a copy of `manifest.csv` and contribute it back.
+---
+## License
+Data: **CC‑BY‑4.0** (Attribution). See `LICENSE` for details.
+---
+## Citation
+If you use this dataset in academic or industrial research, please cite the accompanying paper/report describing the data collection and labeling protocol:
+> **GeNIE: A Generalizable Navigation System for In-the-Wild Environments**
+> Available at: [https://arxiv.org/abs/2506.17960](https://arxiv.org/abs/2506.17960)
+> Contains the SAM-TP traversability dataset and evaluation methodology.
+```
+@article{wang2025genie,
+  title   = {GeNIE: A Generalizable Navigation System for In-the-Wild Environments},
+  author  = {Wang, Jiaming and et al.},
+  journal = {arXiv preprint arXiv:2506.17960},
+  year    = {2025},
+  url     = {https://arxiv.org/abs/2506.17960}
+}
+```
+```
+@misc{sam_tp_dataset,
+  title        = {SAM‑TP Traversability Dataset},
+  howpublished = {Hugging Face Datasets},
+  year         = {2025},
+  note         = {URL: https://huggingface.co/datasets/jamiewjm/sam-tp}
+}
+```