# OCR Data Extraction Project ## Data Structure The repository is organized as follows: ``` Data/ ├── train/ │ ├── annotations.jsonl │ ├── image_1.jpg │ ├── image_2.jpg │ ├── ... │ ├── image_n.jpg ├── val/ │ ├── annotations.jsonl │ ├── image_1.jpg │ ├── image_2.jpg │ ├── ... │ ├── image_n.jpg ``` ## Annotation Format Each line in the `annotations.jsonl` file contains a JSON object with the following structure: ```json { "image": "image_1.jpg", "Question": "explain what is this image about", "answer": "the image represents........." } ```