anyforge
/

anyparse

Model card Files Files and versions

AnyParse

any file parse to markdown(open source for now: pdf, image, office, html, textbase, more in the future) This is base anyparse, we are training plus.

    _                ____
   / \   _ __  _   _|  _ \ __ _ _ __ ___  ___
  / _ \ | '_ \| | | | |_) / _` | '__/ __|/ _ \
 / ___ \| | | | |_| |  __/ (_| | |  \__ \  __/
/_/   \_\_| |_|\__, |_|   \__,_|_|  |___/\___|
               |___/

Github：AnyParse
Hugging Face: AnyParse
ModelScope: AnyParse
if need doc layout detect: anydoclayout
if need doc table detect: anytable

1. Usage

from anyparse.parser import AnyParse
from anyparse.settings import Settings

args = Settings().model_dump() ## see Settings configs
model = AnyParse(args)

# docx,pptx,xlsx,csv,txt,md,html,jpg,png,pdf

file = '1.pdf'

res = model.invoke(file,ocr_mode = "base", stream = False)
res = model.invoke(file,ocr_mode = "plus", stream = False)

2. TodoList

audio file parse
video file parse
we are training anyocr-vlm by 10Mdocuments

Business cooperation or get a more powerful version

email: [email protected]

Buy me a coffee

微信(WeChat)

Thanks

RapidOCR
Nanonets-OCR-s

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including anyforge/anyparse

AnyParse

any file to markdown • 5 items • Updated Sep 19