xiangan commited on
Commit
88fef02
·
verified ·
1 Parent(s): ec90b90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -18,14 +18,12 @@ license: apache-2.0
18
  | **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
19
  | **Normalization** | Layer Normalization |
20
  | **Activation Function** | GELU |
21
- | **Attention Implementation** | Flash Attention 2 |
22
  | **License** | Apache 2.0 |
23
 
24
  ### Key Features
25
 
26
  - **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
27
  - **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
28
- - **Global Contrastive Learning**: Trained with a 2M concept bank for better-separated semantic clusters.
29
  - **Native Resolution Support**: Supports native resolution input without tiling or cropping.
30
  - **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.
31
 
 
18
  | **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
19
  | **Normalization** | Layer Normalization |
20
  | **Activation Function** | GELU |
 
21
  | **License** | Apache 2.0 |
22
 
23
  ### Key Features
24
 
25
  - **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
26
  - **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
 
27
  - **Native Resolution Support**: Supports native resolution input without tiling or cropping.
28
  - **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.
29