lmms-lab-encoder
/

onevision-encoder-large

onevision_encoder

Model card Files Files and versions

xiangan commited on 11 days ago

Commit

88fef02

·

verified ·

1 Parent(s): ec90b90

Update README.md

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -18,14 +18,12 @@ license: apache-2.0
 | **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
 | **Normalization** | Layer Normalization |
 | **Activation Function** | GELU |
-| **Attention Implementation** | Flash Attention 2 |
 | **License** | Apache 2.0 |
 ### Key Features
 - **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
 - **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
-- **Global Contrastive Learning**: Trained with a 2M concept bank for better-separated semantic clusters.
 - **Native Resolution Support**: Supports native resolution input without tiling or cropping.
 - **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.

 | **Positional Encoding** | 3D RoPE (4:6:6 split for T:H:W) |
 | **Normalization** | Layer Normalization |
 | **Activation Function** | GELU |
 | **License** | Apache 2.0 |
 ### Key Features
 - **Codec-Style Patch Selection**: Instead of sampling sparse frames densely (all patches from few frames), OneVision Encoder samples dense frames sparsely (important patches from many frames).
 - **3D Rotary Position Embedding**: Uses a 4:6:6 split for temporal, height, and width dimensions to capture spatiotemporal relationships.
 - **Native Resolution Support**: Supports native resolution input without tiling or cropping.
 - **Flash Attention 2**: Efficient attention implementation for improved performance and memory efficiency.