Update README.md
Browse files
README.md
CHANGED
|
@@ -18,14 +18,12 @@ license: apache-2.0
|
|
| 18 |
| **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
|
| 19 |
| **Normalization** | Layer Normalization |
|
| 20 |
| **Activation Function** | GELU |
|
| 21 |
-
| **Attention Implementation** | Flash Attention 2 |
|
| 22 |
| **License** | Apache 2.0 |
|
| 23 |
|
| 24 |
### Key Features
|
| 25 |
|
| 26 |
- **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
|
| 27 |
- **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
|
| 28 |
-
- **Global Contrastive Learning**: Trained with a 2M concept bank for better-separated semantic clusters.
|
| 29 |
- **Native Resolution Support**: Supports native resolution input without tiling or cropping.
|
| 30 |
- **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.
|
| 31 |
|
|
|
|
| 18 |
| **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
|
| 19 |
| **Normalization** | Layer Normalization |
|
| 20 |
| **Activation Function** | GELU |
|
|
|
|
| 21 |
| **License** | Apache 2.0 |
|
| 22 |
|
| 23 |
### Key Features
|
| 24 |
|
| 25 |
- **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
|
| 26 |
- **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
|
|
|
|
| 27 |
- **Native Resolution Support**: Supports native resolution input without tiling or cropping.
|
| 28 |
- **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.
|
| 29 |
|