jamiewjm commited on
Commit
bc11380
·
verified ·
1 Parent(s): 9f848c5

Upload 2 files

Browse files
Files changed (2) hide show
  1. LICENSE +2 -0
  2. README.md +160 -0
LICENSE ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Creative Commons Attribution 4.0 International (CC BY 4.0)
2
+ https://creativecommons.org/licenses/by/4.0/
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ dataset_name: SAM-TP Traversability Dataset
3
+ pretty_name: SAM-TP Traversability Dataset (Flattened)
4
+ tasks:
5
+ - image-segmentation
6
+ - semantic-segmentation
7
+ tags:
8
+ - robotics
9
+ - navigation
10
+ - traversability
11
+ - outdoor
12
+ - sam2
13
+ - bev
14
+ license: cc-by-4.0
15
+ annotations_creators:
16
+ - machine-assisted
17
+ - humans
18
+ language:
19
+ - en
20
+ size_categories:
21
+ - n<50K
22
+ ---
23
+
24
+ # SAM‑TP Traversability Dataset
25
+
26
+ This repository contains pixel‑wise **traversability masks** paired with egocentric RGB images, prepared in a **flat, filename‑aligned** layout that is convenient for training SAM‑2 / SAM‑TP‑style segmentation models.
27
+
28
+ > **Folder layout**
29
+ ```
30
+ .
31
+ ├─ images/ # RGB frames (.jpg/.png). Filenames are globally unique.
32
+ ├─ annotations/ # Binary masks (.png/.jpg). Filenames match images 1‑to‑1.
33
+ └─ manifest.csv # Provenance rows and any missing‑pair notes.
34
+ ```
35
+ Each `annotations/<FILENAME>` is the mask for `images/<FILENAME>` (same filename, different folder).
36
+
37
+ ---
38
+
39
+ ## File naming
40
+
41
+ Filenames are made globally unique by concatenating the original subfolder path and the local stem with `__` separators, e.g.
42
+ ```
43
+ ride_68496_8ef98b_20240716023032_517__1.jpg
44
+ ride_68496_8ef98b_20240716023032_517__1.png # corresponding mask
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Mask format
50
+
51
+ - Single‑channel binary masks; foreground = **traversable**, background = **non‑traversable**.
52
+ - Stored as `.png` or `.jpg` depending on source. If your pipeline requires PNG, convert on the fly in your dataloader.
53
+ - Values are typically `{0, 255}`. You can binarize via `mask = (mask > 127).astype(np.uint8)`.
54
+
55
+ ---
56
+
57
+ ## How to use
58
+
59
+ ### A) Load with `datasets` (ImageFolder‑style)
60
+
61
+ ```python
62
+ from datasets import load_dataset
63
+ from pathlib import Path
64
+ from PIL import Image
65
+
66
+ REPO = "jamiewjm/sam-tp" # e.g. "jamiewjm/sam-tp"
67
+
68
+ ds_imgs = load_dataset(
69
+ "imagefolder",
70
+ data_dir=".",
71
+ data_files={"image": f"hf://datasets/{REPO}/images/**"},
72
+ split="train",
73
+ )
74
+ ds_msks = load_dataset(
75
+ "imagefolder",
76
+ data_dir=".",
77
+ data_files={"mask": f"hf://datasets/{REPO}/annotations/**"},
78
+ split="train",
79
+ )
80
+
81
+ # Build a mask index by filename
82
+ mask_index = {Path(r["image"]["path"]).name: r["image"]["path"] for r in ds_msks}
83
+
84
+ row = ds_imgs[0]
85
+ img_path = Path(row["image"]["path"])
86
+ msk_path = Path(mask_index[img_path.name])
87
+
88
+ img = Image.open(img_path).convert("RGB")
89
+ msk = Image.open(msk_path).convert("L")
90
+ ```
91
+
92
+ ### B) Minimal PyTorch dataset
93
+
94
+ ```python
95
+ from pathlib import Path
96
+ from PIL import Image
97
+ from torch.utils.data import Dataset
98
+
99
+ class TraversabilityDataset(Dataset):
100
+ def __init__(self, root):
101
+ root = Path(root)
102
+ self.img_dir = root / "images"
103
+ self.msk_dir = root / "annotations"
104
+ self.items = sorted([p for p in self.img_dir.iterdir() if p.is_file()])
105
+ def __len__(self):
106
+ return len(self.items)
107
+ def __getitem__(self, idx):
108
+ ip = self.items[idx]
109
+ mp = self.msk_dir / ip.name
110
+ return Image.open(ip).convert("RGB"), Image.open(mp).convert("L")
111
+ ```
112
+
113
+ ### C) Pre‑processing notes for SAM‑2/SAM‑TP training
114
+
115
+ - Resize/pad to your training resolution (commonly **1024×1024**) with masks aligned.
116
+ - Normalize images per your backbone’s recipe.
117
+ - If your trainer expects COCO‑RLE masks, convert PNG → RLE in the dataloader stage.
118
+
119
+ ---
120
+
121
+ ## Provenance & splits
122
+
123
+ - The dataset was flattened from mirrored directory trees (images and annotations) with 1‑to‑1 filename alignment.
124
+ - If you create explicit `train/val/test` splits, please add a `split` column to a copy of `manifest.csv` and contribute it back.
125
+
126
+ ---
127
+
128
+ ## License
129
+
130
+ Data: **CC‑BY‑4.0** (Attribution). See `LICENSE` for details.
131
+
132
+ ---
133
+
134
+ ## Citation
135
+
136
+ If you use this dataset in academic or industrial research, please cite the accompanying paper/report describing the data collection and labeling protocol:
137
+
138
+ > **GeNIE: A Generalizable Navigation System for In-the-Wild Environments**
139
+
140
+ > Available at: [https://arxiv.org/abs/2506.17960](https://arxiv.org/abs/2506.17960)
141
+
142
+ > Contains the SAM-TP traversability dataset and evaluation methodology.
143
+
144
+ ```
145
+ @article{wang2025genie,
146
+ title = {GeNIE: A Generalizable Navigation System for In-the-Wild Environments},
147
+ author = {Wang, Jiaming and et al.},
148
+ journal = {arXiv preprint arXiv:2506.17960},
149
+ year = {2025},
150
+ url = {https://arxiv.org/abs/2506.17960}
151
+ }
152
+ ```
153
+ ```
154
+ @misc{sam_tp_dataset,
155
+ title = {SAM‑TP Traversability Dataset},
156
+ howpublished = {Hugging Face Datasets},
157
+ year = {2025},
158
+ note = {URL: https://huggingface.co/datasets/jamiewjm/sam-tp}
159
+ }
160
+ ```