jina-reranker-v3-GGUF

GGUF quantizations of jina-reranker-v3 using llama.cpp. A 0.6B parameter multilingual listwise reranker quantized for efficient inference.

Requirements

Installation

pip install numpy safetensors

Files

  • jina-reranker-v3-BF16.gguf - Quantized model weights (BF16, 1.1GB)
  • projector.safetensors - MLP projector weights (3MB)
  • rerank.py - Reranker implementation

Usage

from rerank import GGUFReranker

# Initialize reranker
reranker = GGUFReranker(
    model_path="jina-reranker-v3-BF16.gguf",
    projector_path="projector.safetensors",
    llama_embedding_path="/path/to/llama-embedding"
)

# Rerank documents
query = "What is the capital of France?"
documents = [
    "Paris is the capital and largest city of France.",
    "Berlin is the capital of Germany.",
    "The Eiffel Tower is located in Paris."
]

results = reranker.rerank(query, documents)

for result in results:
    print(f"Score: {result['relevance_score']:.4f}, Doc: {result['document'][:50]}...")

API

GGUFReranker.rerank(query, documents, top_n=None, return_embeddings=False, instruction=None)

Arguments:

  • query (str): Search query
  • documents (List[str]): Documents to rerank
  • top_n (int, optional): Return only top N results
  • return_embeddings (bool): Include embeddings in output
  • instruction (str, optional): Custom ranking instruction

Returns: List of dicts with keys: index, relevance_score, document, and optionally embedding

Citation

If you find jina-reranker-v3 useful in your research, please cite the original paper:

@misc{wang2025jinarerankerv3lateinteractiondocument,
      title={jina-reranker-v3: Last but Not Late Interaction for Document Reranking},
      author={Feng Wang and Yuqing Li and Han Xiao},
      year={2025},
      eprint={2509.25085},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.25085},
}

License

This MLX implementation follows the same CC BY-NC 4.0 license as the original model. For commercial usage inquiries, please contact Jina AI.

Downloads last month
1,550
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jinaai/jina-reranker-v3-GGUF

Finetuned
Qwen/Qwen3-0.6B
Quantized
(2)
this model

Collection including jinaai/jina-reranker-v3-GGUF