skatzR commited on
Commit
4c2ef07
Β·
verified Β·
1 Parent(s): 280226b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -6
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
  - quantization
11
  - sentence-embeddings
12
  - semantic-search
 
13
  ---
14
 
15
  # 🧩 DeepVK-USER-BGE-M3 β€” Quantized ONNX (INT8)
@@ -30,7 +31,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
30
  | **Supported HW** | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2) |
31
  | **License** | Apache-2.0 |
32
 
33
- ---
34
 
35
  ## πŸš€ Features
36
 
@@ -38,7 +38,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
38
  - πŸ“¦ **Lightweight** β€” reduced model size, lower memory footprint.
39
  - πŸ”„ **Drop-in replacement** β€” embeddings compatible with the FP32 version.
40
  - 🌍 **Multilingual** β€” supports Russian πŸ‡·πŸ‡Ί and English πŸ‡¬πŸ‡§.
41
- ---
42
 
43
  ## 🧠 Intended Use
44
 
@@ -51,7 +50,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
51
  **❌ Not ideal for:**
52
  - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
53
  - GPU-optimized pipelines (prefer FP16/FP32 models instead)
54
- ---
55
 
56
  ## βš–οΈ Pros & Cons of Quantized ONNX
57
 
@@ -64,7 +62,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
64
  - Slight accuracy drop compared to static quantization.
65
  - AVX512 optimizations only on modern Intel CPUs.
66
  - No GPU acceleration in this export.
67
- ---
68
 
69
  ## πŸ“Š Benchmark
70
 
@@ -77,7 +74,6 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
77
  | Inference speed | ~2Γ— faster |
78
  | Model size (MB) | 347.5 |
79
 
80
- ---
81
 
82
  ## πŸ“‚ Files
83
 
@@ -87,7 +83,6 @@ tokenizer.json, vocab.txt, special_tokens_map.json β€” tokenizer
87
 
88
  config.json β€” model config
89
 
90
- ---
91
 
92
  ## 🧩 Examples
93
 
 
10
  - quantization
11
  - sentence-embeddings
12
  - semantic-search
13
+
14
  ---
15
 
16
  # 🧩 DeepVK-USER-BGE-M3 β€” Quantized ONNX (INT8)
 
31
  | **Supported HW** | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2) |
32
  | **License** | Apache-2.0 |
33
 
 
34
 
35
  ## πŸš€ Features
36
 
 
38
  - πŸ“¦ **Lightweight** β€” reduced model size, lower memory footprint.
39
  - πŸ”„ **Drop-in replacement** β€” embeddings compatible with the FP32 version.
40
  - 🌍 **Multilingual** β€” supports Russian πŸ‡·πŸ‡Ί and English πŸ‡¬πŸ‡§.
 
41
 
42
  ## 🧠 Intended Use
43
 
 
50
  **❌ Not ideal for:**
51
  - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
52
  - GPU-optimized pipelines (prefer FP16/FP32 models instead)
 
53
 
54
  ## βš–οΈ Pros & Cons of Quantized ONNX
55
 
 
62
  - Slight accuracy drop compared to static quantization.
63
  - AVX512 optimizations only on modern Intel CPUs.
64
  - No GPU acceleration in this export.
 
65
 
66
  ## πŸ“Š Benchmark
67
 
 
74
  | Inference speed | ~2Γ— faster |
75
  | Model size (MB) | 347.5 |
76
 
 
77
 
78
  ## πŸ“‚ Files
79
 
 
83
 
84
  config.json β€” model config
85
 
 
86
 
87
  ## 🧩 Examples
88