---
title: Grobid OCR
emoji: 📄
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---

# Grobid PDF Document Processor

This space uses Grobid to extract structured information from PDF documents, particularly academic papers.

## Features

- **Header Extraction**: Fast extraction of title, authors, and abstract
- **Full Text Processing**: Complete document processing including introduction sections
- **Academic Focus**: Optimized for scholarly documents and research papers

## Usage

1. Upload a PDF document
2. Choose extraction type:
   - **Header Only**: Quick extraction of metadata
   - **Full Text**: Complete processing including introduction
3. Click "Process PDF" to get structured results

## Technology

- [Grobid](https://github.com/kermitt2/grobid): Machine learning library for PDF extraction
- [Gradio](https://gradio.app/): Web interface framework
- Docker: Containerized deployment

Perfect for researchers who need to quickly extract key information from academic papers!