# OCR Data Extraction Project

## Data Structure

The repository is organized as follows:

```
Data/
├── train/
│   ├── annotations.jsonl
│   ├── image_1.jpg
│   ├── image_2.jpg
│   ├── ...
│   ├── image_n.jpg
├── val/
│   ├── annotations.jsonl
│   ├── image_1.jpg
│   ├── image_2.jpg
│   ├── ...
│   ├── image_n.jpg
```

## Annotation Format

Each line in the `annotations.jsonl` file contains a JSON object with the following structure:

```json
{
  "image": "image_1.jpg",
  "Question": "explain what is this image about",
  "answer": "the image represents........."
  }

```