inferencerlabs
/

openai-gpt-oss-120b-MLX-6.5bit

Text Generation

Model card Files Files and versions

inferencerlabs commited on Aug 6

Commit

c3620d0

·

verified ·

1 Parent(s): 450a65c

Upload complete model

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+---
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: mlx
+tags:
+- vllm
+- mlx
+base_model: openai/gpt-oss-120b
+---
+**See gpt-oss-120b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
+*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).*
+| Quantization | Perplexity |
+|:------------:|:----------:|
+| **q2**       | 41.293     |
+| **q3**       | 1.900      |
+| **q4**       | 1.168      |
+| **q6**       | 1.128      |
+| **q8**       | 1.128      |
+## Usage Notes
+* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
+* Memory usage: ~95 GB
+* Expect ~60 tokens/s
+* For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-120b).