---
license: cc-by-nc-4.0
base_model:
- Salesforce/SFR-Embedding-2_R
pipeline_tag: feature-extraction
---

## Model Card: APEX-Embedding-7B [NON-COMMERCIAL USE ONLY]
## [Fifth Dimension](https://www.fifthdimensionai.com/en-gb)

### Read our paper on ArXiv here: [https://arxiv.org/abs/2410.18105](https://arxiv.org/abs/2410.18105)

### Model Overview

APEX-Embedding-7B is a 7-billion parameter model optimized for **Factual Document Retrieval** in **Retrieval-Augmented Generation (RAG)** systems. During training, the model was enhanced using **Structured Entity Relationship Maps** and **Model-Aware Contrastive Sampling** to focus on factual accuracy. The final model is highly effective for generating precise text embeddings, especially for industries that rely on large-scale document retrieval tasks such as legal, compliance, and real estate.

This model achieves a **90.86%** in rank@1 accuracy during our document retrieval evaluation compared to similar models, ensuring reliable and accurate retrieval of relevant documents from large datasets.

### License & Disclaimer
License: **Creative Commons Attribution Non Commercial 4.0 License**

This release is for non-commercial research purposes only, and has been published in support of an academic paper. The model has been fine-tuned for real-world Factual RAG tasks and has not been designed or evaluated for *all* embedding tasks, such as those found in the MTEB or other benchmark datasets.
This model is provided 'as is' without any warranties and the authors are not to be held responsible for any consequences resulting from its use. Users are responsible for ensuring both their legal compliance with the license terms, local laws, and the model's technical suitability for their specific use case.

---
### How to Use

The following steps guide you on how to generate embeddings and compare the similarity between queries and documents.

#### Environment Setup

First, install the necessary libraries:

```bash
pip install torch transformers peft accelerate numpy
pip install -U bitsandbytes
```

#### Loading the Model

Here is the code to load the model:
(There is a pre-quantised version of the base model available in the `base` directory)
```python
from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

model_path = "5DAI/APEX-Embedding-7B-v0.1"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModel.from_pretrained(
    model_path,
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.to(device)
```

#### Generating Embeddings for Queries and Documents

When generating embeddings, use the following instruction prompt for both **queries** and **documents**:

```python
def get_embedding(text: str, model, tokenizer):
    prompt = f"Instruction: Please perform a RAG search based on the following. Text: {text}"
    inputs = tokenizer(prompt, return_tensors="pt", max_length=8192, padding=True, truncation=True).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    embedding = outputs.last_hidden_state[:, -1, :]  # Use last token pooling
    embedding = torch.nn.functional.normalize(embedding, p=2, dim=1)
    return embedding.cpu().numpy()
```
Context Window extends to 32K tokens, but best results can be achieved if text is chunked by page (< 8192 tokens).

#### Cosine Similarity for Queries and Documents

To compare a **query** and a **document**, use the cosine similarity function:

```python
import numpy as np

def cosine_similarity(vecA: np.ndarray, vecB: np.ndarray) -> float:
    normA = np.linalg.norm(vecA)
    normB = np.linalg.norm(vecB)
    return np.dot(vecA, vecB) / (normA * normB) if normA > 0 and normB > 0 else 0
```

#### Example: Query vs. Document Embedding

Here’s how to generate embeddings for a query and a document, and compare their similarity:

```python
query = "What are the legal requirements for property zoning in urban areas?"
document = "This document contains details about urban property zoning laws, including legal frameworks and compliance standards."

query_embedding = get_embedding(query, model, tokenizer)[0]
document_embedding = get_embedding(document, model, tokenizer)[0]

similarity = cosine_similarity(query_embedding, document_embedding)
print(f"Cosine similarity between query and document: {similarity}")
```

---

### Citation

Please cite APEX-Embedding-7B as follows:

```
@misc{APEX-embedding-7b,
  title={APEX-Embedding-7B: Improving Embedding Accuracy for Document Retrieval Using Entity Relationship Maps and Model-Aware Contrastive Sampling},
  author={Thea Aviss},
  year={2024},
  url={https://arxiv.org/abs/2410.18105}
}
```