skatzR
/

USER-BGE-M3-ONNX-INT8

sentence-embeddings

semantic-search

Model card Files Files and versions

skatzR commited on Sep 12

Commit

4c2ef07

·

verified ·

1 Parent(s): 280226b

Update README.md

Files changed (1) hide show

README.md +1 -6

README.md CHANGED Viewed

@@ -10,6 +10,7 @@ tags:
 - quantization
 - sentence-embeddings
 - semantic-search
 ---
 # 🧩 DeepVK-USER-BGE-M3 — Quantized ONNX (INT8)
@@ -30,7 +31,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
 | **Supported HW**    | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2)               |
 | **License**         | Apache-2.0                                                            |
----
 ## 🚀 Features
@@ -38,7 +38,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
 - 📦 **Lightweight** — reduced model size, lower memory footprint.
 - 🔄 **Drop-in replacement** — embeddings compatible with the FP32 version.
 - 🌍 **Multilingual** — supports Russian 🇷🇺 and English 🇬🇧.
----
 ## 🧠 Intended Use
@@ -51,7 +50,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
 **❌ Not ideal for:**
 - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
 - GPU-optimized pipelines (prefer FP16/FP32 models instead)
----
 ## ⚖️ Pros & Cons of Quantized ONNX
@@ -64,7 +62,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
 - Slight accuracy drop compared to static quantization.
 - AVX512 optimizations only on modern Intel CPUs.
 - No GPU acceleration in this export.
----
 ## 📊 Benchmark
@@ -77,7 +74,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
 | Inference speed                 | ~2× faster |
 | Model size (MB)                 | 347.5      |
----
 ## 📂 Files
@@ -87,7 +83,6 @@ tokenizer.json, vocab.txt, special_tokens_map.json — tokenizer
 config.json — model config
----
 ## 🧩 Examples

 - quantization
 - sentence-embeddings
 - semantic-search
 ---
 # 🧩 DeepVK-USER-BGE-M3 — Quantized ONNX (INT8)
 | **Supported HW**    | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2)               |
 | **License**         | Apache-2.0                                                            |
 ## 🚀 Features
 - 📦 **Lightweight** — reduced model size, lower memory footprint.
 - 🔄 **Drop-in replacement** — embeddings compatible with the FP32 version.
 - 🌍 **Multilingual** — supports Russian 🇷🇺 and English 🇬🇧.
 ## 🧠 Intended Use
 **❌ Not ideal for:**
 - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
 - GPU-optimized pipelines (prefer FP16/FP32 models instead)
 ## ⚖️ Pros & Cons of Quantized ONNX
 - Slight accuracy drop compared to static quantization.
 - AVX512 optimizations only on modern Intel CPUs.
 - No GPU acceleration in this export.
 ## 📊 Benchmark
 | Inference speed                 | ~2× faster |
 | Model size (MB)                 | 347.5      |
 ## 📂 Files
 config.json — model config
 ## 🧩 Examples