Author of this model: Microsoft, 2024. License: MIT.

Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1

Prompt template: ChatML (according to llama.cpp's server)? Mistral (according to tokenizer_config.json)? Alpaca (according to text-generation-webui)? All three seem to work.

Context length: ?

According to their paper on arXiv, rho-math-7b-v0.1 is a continued pretraining on Mistral-7B, while their 1B model is a continued pretraining on TinyLlama-1.1B.

My experience

Unlike the non-interpreter variant of this model, this -interpreter variant always answers with Python code, example:

Input: Width of circle is 3cm, what is its area?

with settings { presence_penalty: 0, frequency_penalty: 0, top_p: 1, min_p: 0, top_k: 0, temperature: 0.8 }

outputs (using unquantized gguf):

```python
from sympy import pi, Rational

def circle_area():
    """Width of circle is 3cm, what is its area?"""
    r = Rational(3, 2)  # Radius of the circle
    area = pi * r**2  # Area of the circle

    return area

result = circle_area()
print(result)
```
```output
27*pi/4
```
The area of the circle is $\boxed{\frac{27\pi}{4}}$ square cm.

??? It should have been 9*pi/4. Am I using this model wrong? Same result with temperature=0.0,top_k=1.

In comparison, the Q6_K of the non-interpreter variant of this model with temp=0.0 outputs:

The area of a circle is given by the formula A = ฯ€r^2, where r is the radius of the circle. Since the width of the circle is 3cm, the radius is half of that, or 1.5cm. Substituting this value into the formula, we get:

A = ฯ€(1.5)^2
A = ฯ€(2.25)
A = 6.9887654321

So, the area of the circle is approximately 6.99 cm^2.

(close to the expected 7.06 which deepseek-math-7b-rl.Q8_0.gguf outputs)

imatrix

I created imatrix with

./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat

which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. imatrix consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.

quantize

Quantized with llama.cpp b2661 (2024-04-12), compiled with LLAMA_CUDA_FORCE_MMQ=1 (full cmd: make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).

Downloads last month
12
GGUF
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for arzeth/rho-math-7b-interpreter-v0.1.imatrix-GGUF

Quantized
(6)
this model