Model Card for BGE-M3 ONNX Int8
This is the ONNX version of BAAI/BGE-M3 embedding model quantized to int8.
Model Description
- Developed by: Mahrad Hosseini
- Model type: Embedding Model
- License: Apache-2.0
- Finetuned from model: BAAI/BGE-M3
Uses
- Running with better performance on CPU
- Running with better performance on low-end GPUs
- Running on Edge Devices with low computational power
- Running on low-latency servers
- Running on devices with limited RAM
How to Get Started with the Model
Use the code below to get started with the model.
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3") # Better to use the original tokenizer (very lightweight)
model = ORTModelForFeatureExtraction.from_pretrained("MahradHosseini/bge-m3-onnx-int8")
questions = ["What is your opening hour?", "Where are your offices?"]
input_q = tokenizer(
questions,
padding=True,
truncation=True,
return_tensors="np"
)
print(f"Question input keys: {list(input_q.keys())}, shapes: {[v.shape for v in input_q.values()]}")
output_q = self.model(**input_q)
print(f"Question output keys: {list(output_q.keys())}, shapes: {[v.shape for v in output_q.values()]}")
question_embeddings = {
"dense_vecs": mean_pooling(output_q["last_hidden_state"], input_q["attention_mask"]),
}
print(f"Embedded {len(question_embeddings['dense_vecs'])} questions from {self.data_file}")
def mean_pooling(last_hidden_state, attention_mask):
# last_hidden_state: [batch_size, seq_len, hidden_size]
# attention_mask: [batch_size, seq_len]
input_mask_expanded = np.expand_dims(attention_mask, -1).astype(np.float32)
return np.sum(last_hidden_state * input_mask_expanded, axis=1) / np.clip(
input_mask_expanded.sum(axis=1), a_min=1e-9, a_max=None
)
Conversion Details
HuggingFace Optimum was used to convert the base model to ONNX and then to quantize to int8.
0. fresh env with safe versions
pip install "optimum[exporters]" onnx onnxruntime
1. export WITHOUT optimisation
optimum-cli export onnx -m BAAI/bge-m3 bge-m3-onnx
2. int8 dynamic quantisation, AVX2 preset, per-channel
optimum-cli onnxruntime quantize
--onnx_model bge-m3-onnx
--avx2 --per_channel
-o bge-m3-int8
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for aris-bb/bge-m3-onnx-int8
Base model
BAAI/bge-m3