Update README.md
Browse files
README.md
CHANGED
|
@@ -1,9 +1,127 @@
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
-
-
|
| 4 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
+
- RDT
|
| 4 |
+
- rdt
|
| 5 |
+
- tokenizer
|
| 6 |
+
- action
|
| 7 |
+
- discrete
|
| 8 |
+
- vector-quantization
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
pipeline_tag: robotics
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# RVQ-AT: Residual VQ Action Tokenizer for RDT 2
|
| 14 |
+
|
| 15 |
+
**RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
|
| 16 |
+
It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
|
| 17 |
+
|
| 18 |
+
Unlike single-codebook VQ, RVQ-AT stacks multiple small codebooks and quantizes **residuals** level-by-level. This yields:
|
| 19 |
+
|
| 20 |
+
* **Higher fidelity at the same bitrate** (lower recon MSE / higher SNR)
|
| 21 |
+
* **Shorter token sequences** for the same time horizon
|
| 22 |
+
* **Stable training** via commitment loss, EMA codebook updates, and dead-code revival
|
| 23 |
+
|
| 24 |
+
Here, we provide:
|
| 25 |
+
|
| 26 |
+
1. **RVQ-AT (Universal)** — a general-purpose tokenizer trained on diverse manipulation & navigation logs.
|
| 27 |
+
2. **Simple APIs to fit your own tokenizer** on custom action datasets.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
## Using the Universal RVQ-AT Tokenizer
|
| 33 |
+
|
| 34 |
+
We recommend chunking actions into \~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **\[-1, 1]** before tokenization. Batched encode/decode are supported.
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
import numpy as np
|
| 38 |
+
from transformers import AutoProcessor
|
| 39 |
+
|
| 40 |
+
# Load from the Hub (replace with your repo id once published)
|
| 41 |
+
proc = .from_pretrained(
|
| 42 |
+
"your-org/residual-vq-action-tokenizer", # e.g., "your-org/rvq-at-universal"
|
| 43 |
+
trust_remote_code=True
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
# Dummy batch: [batch, T, action_dim], concretely [batch_size, 24, 20]
|
| 47 |
+
action_data = np.random.uniform(-1, 1, size=(256, 50, 20)).astype("float32")
|
| 48 |
+
|
| 49 |
+
# Encode → tokens (List[List[int]] or np.ndarray[int])
|
| 50 |
+
tokens = proc(action_data) # or proc.encode(action_data)
|
| 51 |
+
|
| 52 |
+
# Decode back to continuous actions
|
| 53 |
+
# The processor caches (T, action_dim) on first forward;
|
| 54 |
+
# or specify explicitly:
|
| 55 |
+
recon = proc.decode(tokens, time_horizon=50, action_dim=14)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
**Notes**
|
| 59 |
+
|
| 60 |
+
* If your pipeline uses variable-length chunks, pass `time_horizon` per sample to `decode(...)`.
|
| 61 |
+
* Special tokens (`pad`, `eos`, optional `chunk_sep`) are reserved and shouldn’t be used as code indices.
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## Recommended Preprocessing
|
| 66 |
+
|
| 67 |
+
* **Chunking:** 0.5–1.0 s windows work well for 10–50 Hz logs.
|
| 68 |
+
* **Normalization:** per-dimension robust scaling to `[-1, 1]` (e.g., 1–99% quantiles). Save stats in `preprocessor_config.json`.
|
| 69 |
+
* **Padding:** for variable `T`, pad to a small multiple of stride; RVQ-AT masks paddings internally.
|
| 70 |
+
* **Action spaces:** supports mixed spaces (e.g., 7-DoF joints + gripper + base). Concatenate into a flat vector per timestep.
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
<!-- ## Performance (Universal Model)
|
| 75 |
+
|
| 76 |
+
*(Representative, measured on internal eval — replace with your numbers when available.)*
|
| 77 |
+
|
| 78 |
+
* **Compression:** 4 levels × 1 token/step → 4 tokens/step (often reduced further with temporal stride).
|
| 79 |
+
* **Reconstruction:** MSE ↓ 25–40% vs. single-codebook VQ at equal bitrate.
|
| 80 |
+
* **Latency:** <1 ms per 50×14 chunk on A100/PCIe; CPU-only real-time at 50 Hz feasible.
|
| 81 |
+
* **Downstream VLA:** +1–3% SR on long-horizon tasks vs. raw-action modeling.
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
-->
|
| 85 |
+
## Safety & Intended Use
|
| 86 |
+
|
| 87 |
+
RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
|
| 88 |
+
|
| 89 |
+
* Proper sim-to-real validation,
|
| 90 |
+
* Safety bounds/clamping and rate limiters,
|
| 91 |
+
* Task-level monitors and e-stop fallbacks.
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## FAQ
|
| 96 |
+
|
| 97 |
+
**Q: How do I get back a `[T, A]` matrix at decode?**
|
| 98 |
+
A: RVQ-AT caches `(time_horizon, action_dim)` on first `__call__`/`encode`. You can also pass them explicitly to `decode(...)`.
|
| 99 |
+
|
| 100 |
+
**Q: Can I store shorter token sequences?**
|
| 101 |
+
A: Yes—enable `temporal_stride>1` to quantize a downsampled latent; the decoder upsamples.
|
| 102 |
+
|
| 103 |
+
**Q: How do I integrate with `transformers` trainers?**
|
| 104 |
+
A: Treat RVQ-AT output as a discrete vocabulary and feed tokens to your VLA LM. Keep special token ids consistent across datasets.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## Citation
|
| 109 |
+
|
| 110 |
+
If you use RVQ-AT in your work, please cite:
|
| 111 |
+
|
| 112 |
+
```bibtex
|
| 113 |
+
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## Contact
|
| 119 |
+
|
| 120 |
+
* Maintainers: Your Name [[email protected]](mailto:[email protected])
|
| 121 |
+
* Issues & requests: open a GitHub issue or start a Hub discussion on the model page.
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## License
|
| 126 |
+
|
| 127 |
+
This repository and the released models are licensed under **Apache-2.0**. You may use, modify, and distribute, provided you **keep a copy of the original license and notices** in your distributions and **state significant changes** when you make them.
|