position of detected elements and bounding box
#24
by
ashesvats
- opened
I’m building a PDF‑extraction app that needs to detect images inside PDFs and extract their bounding‑box coordinates. have tried adding a prompt such as:
Detect and recognize text in the image, and output the text coordinates in a formatted manner.
The response contains bounding box coordinates in HTML tags in array format.
However, when I extract the numbers they don’t seem to match the actual positions of the detected images in the PDF.
Any guidance on the correct interpretation of the bbox values or a sample snippet for extraction would be greatly appreciated.
Thank you!
ashesvats
changed discussion title from
Bounding Box Co-ordinates
to position of detected elements and bounding box
x1 = int(x01 / 1000 * imwidth)
y1 = int(y01 / 1000 * imheight)
x2 = int(x02 / 1000 * imwidth)
y2 = int(y02 / 1000 * imheight)