YOLOWorld

This SDK enables efficient Open-Vocabulary-Object-Detection using YOLO-Worldv2 Large, optimized for Axera’s NPU-based SoC platforms including AX650 Series, AX630C Series, AX8850 Series, or Axera's dedicated AI accelerator.

References links:

For those who are interested in model conversion, you can try to export axmodel through

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
AX630C

Performance

Model	Input Shape	Latency (ms)	CMM Usage (MB)
yolo_u16_ax650.axmodel	1 x 640 x 640 x 3	9.522 ms	21 MB
clip_b1_u16_ax650.axmodel	1 x 77	2.997 ms	137 MB
yolo_u16_ax630c.axmodel	1 x 640 x 640 x 3	43.450 ms	31 MB
clip_b1_u16_ax630c.axmodel	1 x 77	10.703 ms	134 MB

How to use

Download all files from this repository to the device

root@ax650 ~/root/YOLO-World-V2 # tree -L 2
.
|-- README.md
|-- config.json
|-- football.jpg
|-- install
|   |-- bin
|   `-- lib
|-- models
|   |-- clip_b1_u16_ax630c.axmodel
|   |-- clip_b1_u16_ax650.axmodel
|   |-- yolo_u16_ax630c.axmodel
|   `-- yolo_u16_ax650.axmodel
|-- pyyoloworld
|   |-- __pycache__
|   |-- example.py
|   |-- gardio_example.jpg
|   |-- gradio_example.py
|   |-- host.jpg
|   |-- libyoloworld.so
|   |-- pyaxdev.py
|   |-- pyyoloworld.py
|   |-- requirements.txt
|   `-- result_host.jpg
|-- result.png
`-- vocab.txt

6 directories, 18 files

python env requirement

pip install -r pyyoloworld/requirements.txt

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650 ~/root/YOLO-World-V2 # cp install/lib/host_650/libyoloworld.so ./pyyoloworld/
root@ax650 ~/root/YOLO-World-V2 # cd pyyoloworld/
root@ax650 ~/root/YOLO-World-V2/pyyoloworld # python3 gradio_example.py --yoloworld ../models/yolo_u16_ax650.axmodel --tenc ../models/clip_b1_u16_ax650.axmodel --vocab ../vocab.txt --dev_type host
Trying to load: /root/root/YOLO-World-V2/pyyoloworld/aarch64/libyoloworld.so

❌ Failed to load: /root/root/YOLO-World-V2/pyyoloworld/aarch64/libyoloworld.so
   /root/root/YOLO-World-V2/pyyoloworld/aarch64/libyoloworld.so: cannot open shared object file: No such file or directory
🔍 File not found. Please verify that libclip.so exists and the path is correct.

Trying to load: /root/root/YOLO-World-V2/pyyoloworld/libyoloworld.so
open libaxcl_rt.so failed
unsupport axcl
✅ Successfully loaded: /root/root/YOLO-World-V2/pyyoloworld/libyoloworld.so
sh: line 1: axcl-smi: command not found

input size: 2
    name:   images [unknown] [unknown]
        1 x 640 x 640 x 3   size: 1228800

    name: txt_feats [unknown] [unknown]
        1 x 4 x 512   size: 8192


output size: 3
    name:  stride8
        1 x 80 x 80 x 68   size: 1740800

    name: stride16
        1 x 40 x 40 x 68   size: 435200

    name: stride32
        1 x 20 x 20 x 68   size: 108800

[I][                       yw_create][ 408]: num_classes: 4, num_features: 512, input w: 640, h: 640
is_output_nhwc: 1

input size: 1
    name: text_token [unknown] [unknown]
        1 x 77   size: 308


output size: 1
    name:     2202
        1 x 1 x 512   size: 2048

[I][               load_text_encoder][  44]: text feature len 512
[I][                  load_tokenizer][  60]: text token len 77
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

Input：person, dog, car, horse and the test image

Result：

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libstdc++.so.6
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cp install/lib/axcl_aarch64/libyoloworld.so pyyoloworld/
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cd pyyoloworld/
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg/pyyoloworld $ python gradio_example.py --yoloworld ../models/yolo_u16_ax650.axmodel --tenc ../models/clip_b1_u16_ax650.axmodel --vocab ../vocab.txt --dev_type axcl
Trying to load: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/aarch64/libyoloworld.so
✅ Successfully loaded: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/libyoloworld.so
[I][                             run][  31]: AXCLWorker start with devid 0

input size: 2
    name:   images [unknown] [unknown]
        1 x 640 x 640 x 3   size: 1228800

    name: txt_feats [unknown] [unknown]
        1 x 4 x 512   size: 8192


output size: 3
    name:  stride8
        1 x 80 x 80 x 68   size: 1740800

    name: stride16
        1 x 40 x 40 x 68   size: 435200

    name: stride32
        1 x 20 x 20 x 68   size: 108800

[I][                       yw_create][ 408]: num_classes: 4, num_features: 512, input w: 640, h: 640
is_output_nhwc: 1

input size: 1
    name: text_token [unknown] [unknown]
        1 x 77   size: 308


output size: 1
    name:     2202
        1 x 1 x 512   size: 2048

[I][               load_text_encoder][  44]: text feature len 512
[I][                  load_tokenizer][  60]: text token len 77
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

If your Raspberry PI 5 IP Address is 192.168.1.100, so using this URL http://192.168.1.100:7860 with your WebApp.

Input：man, shoes, ball, person and the test image

Result：

Downloads last month: 50

Inference Providers NEW

Zero-Shot Object Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AXERA-TECH/YOLO-World-V2

Multimodal Models

Collection

25 items • Updated Nov 9 • 1