AnyParse

any file parse to markdown(open source for now: pdf, image, office, html, textbase, more in the future) This is base anyparse, we are training plus.

Static Badge

    _                ____
   / \   _ __  _   _|  _ \ __ _ _ __ ___  ___
  / _ \ | '_ \| | | | |_) / _` | '__/ __|/ _ \
 / ___ \| | | | |_| |  __/ (_| | |  \__ \  __/
/_/   \_\_| |_|\__, |_|   \__,_|_|  |___/\___|
               |___/

1. Usage

from anyparse.parser import AnyParse
from anyparse.settings import Settings

args = Settings().model_dump() ## see Settings configs
model = AnyParse(args)

# docx,pptx,xlsx,csv,txt,md,html,jpg,png,pdf

file = '1.pdf'

res = model.invoke(file,ocr_mode = "base", stream = False)
res = model.invoke(file,ocr_mode = "plus", stream = False)

2. TodoList

  • audio file parse
  • video file parse
  • we are training anyocr-vlm by 10Mdocuments

Business cooperation or get a more powerful version

Buy me a coffee

  • 微信(WeChat)

Thanks

  • RapidOCR
  • Nanonets-OCR-s
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including anyforge/anyparse