guoxy25 commited on
Commit
bdfe2bd
·
verified ·
1 Parent(s): 64127f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -2,4 +2,14 @@
2
  language:
3
  - en
4
  - zh
5
- ---
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  - zh
5
+ ---
6
+
7
+ <h2 align="center">Ocean-OCR</a></h2>
8
+
9
+ <p align="center">
10
+ <img src="benchmarks.png" style="width: 700px" align=center>
11
+ </p>
12
+
13
+
14
+ ## Introduction
15
+ Multimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities. Despite the rapid progress made previously, insufficient OCR ability hinders MLLMs from excelling in text-related tasks. In this paper, we present Ocean-OCR, a 3B MLLM with state-of-the-art performance on various OCR scenarios and comparable understanding ability on general tasks. We employ Native Resolution ViT to enable variable resolution input and utilize a substantial collection of high-quality OCR datasets to enhance the model performance.