|
|
--- |
|
|
base_model: microsoft/rho-math-7b-interpreter-v0.1 |
|
|
license: mit |
|
|
model_creator: Microsoft |
|
|
model_name: rho-math-7b-interpreter-v0.1 |
|
|
model_type: mistral |
|
|
library_name: gguf |
|
|
tags: |
|
|
- math |
|
|
quantized_by: arzeth |
|
|
--- |
|
|
|
|
|
Author of this model: Microsoft, 2024. License: MIT. |
|
|
|
|
|
Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1 |
|
|
|
|
|
Prompt template: ChatML (according to llama.cpp's `server`)? Mistral (according to `tokenizer_config.json`)? Alpaca (according to text-generation-webui)? All three seem to work. |
|
|
|
|
|
Context length: ? |
|
|
|
|
|
According to their [paper on arXiv](https://arxiv.org/abs/2404.07965), rho-math-7b-v0.1 is a continued pretraining on Mistral-7B, while their 1B model is a continued pretraining on TinyLlama-1.1B. |
|
|
|
|
|
# My experience |
|
|
|
|
|
Unlike [the non-interpreter variant](https://huggingface.co/microsoft/rho-math-7b-v0.1) of this model, this `-interpreter` variant always answers with Python code, example: |
|
|
|
|
|
Input: `Width of circle is 3cm, what is its area?` |
|
|
|
|
|
with settings `{ |
|
|
presence_penalty: 0, |
|
|
frequency_penalty: 0, |
|
|
top_p: 1, |
|
|
min_p: 0, |
|
|
top_k: 0, |
|
|
temperature: 0.8 |
|
|
}` |
|
|
|
|
|
outputs (using unquantized gguf): |
|
|
|
|
|
<pre> |
|
|
```python |
|
|
from sympy import pi, Rational |
|
|
|
|
|
def circle_area(): |
|
|
"""Width of circle is 3cm, what is its area?""" |
|
|
r = Rational(3, 2) # Radius of the circle |
|
|
area = pi * r**2 # Area of the circle |
|
|
|
|
|
return area |
|
|
|
|
|
result = circle_area() |
|
|
print(result) |
|
|
``` |
|
|
```output |
|
|
27*pi/4 |
|
|
``` |
|
|
The area of the circle is $\boxed{\frac{27\pi}{4}}$ square cm. |
|
|
</pre> |
|
|
|
|
|
??? It should have been `9*pi/4`. Am I using this model wrong? Same result with temperature=0.0,top_k=1. |
|
|
|
|
|
In comparison, the Q6_K of the non-interpreter variant of this model with temp=0.0 outputs: |
|
|
``` |
|
|
The area of a circle is given by the formula A = πr^2, where r is the radius of the circle. Since the width of the circle is 3cm, the radius is half of that, or 1.5cm. Substituting this value into the formula, we get: |
|
|
|
|
|
A = π(1.5)^2 |
|
|
A = π(2.25) |
|
|
A = 6.9887654321 |
|
|
|
|
|
So, the area of the circle is approximately 6.99 cm^2. |
|
|
``` |
|
|
(close to the expected 7.06 which `deepseek-math-7b-rl.Q8_0.gguf` outputs) |
|
|
|
|
|
# imatrix |
|
|
|
|
|
I created imatrix with |
|
|
``` |
|
|
./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat |
|
|
``` |
|
|
which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. `imatrix` consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM. |
|
|
|
|
|
# quantize |
|
|
|
|
|
Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing). |