position of detected elements and bounding box

#24
by ashesvats - opened

I’m building a PDF‑extraction app that needs to detect images inside PDFs and extract their bounding‑box coordinates. have tried adding a prompt such as:
Detect and recognize text in the image, and output the text coordinates in a formatted manner.

The response contains bounding box coordinates in HTML tags in array format.

However, when I extract the numbers they don’t seem to match the actual positions of the detected images in the PDF.

Any guidance on the correct interpretation of the bbox values or a sample snippet for extraction would be greatly appreciated.

Thank you!

ashesvats changed discussion title from Bounding Box Co-ordinates to position of detected elements and bounding box
    x1 = int(x01 / 1000 * imwidth)
    y1 = int(y01 / 1000 * imheight)
    x2 = int(x02 / 1000 * imwidth)
    y2 = int(y02 / 1000 * imheight)

Sign up or log in to comment