Post
217
Introducing Palmyra-mini: Compact AI Models for Efficient Inference
The Palmyra-mini family from Writer includes three lightweight models designed for high performance and efficient inference. These models are ideal for developers looking to integrate AI capabilities without excessive computational overhead.
Model Variants
* palmyra-mini: A base model for general-purpose generative tasks, achieving 52.6% on Big Bench Hard (exact match).
* palmyra-mini-thinking-a: Optimized for complex logical reasoning with a Chain of Thought (CoT) approach, scoring 82.87% on GSM8K (strict match).
* palmyra-mini-thinking-b: Specialized for mathematical reasoning, achieving 92.5% on AMC23.
Technical Details
* All models are based on the Qwen architecture, compatible with popular inference frameworks like vLLM, SGLang, and TGI.
* "Thinking" models utilize CoT training for enhanced reasoning capabilities.
* GGUF and MLX quantizations are available for optimized performance.
For more information, including benchmark methodologies and detailed performance metrics, refer to our blog post: (https://huggingface.co/blog/Writer/announcing-palmyra-mini).
Model repos can be found here:
* Writer/palmyra-mini
* Writer/palmyra-mini-thinking-a
* Writer/palmyra-mini-thinking-b
Also check out a mobile implementation of palmyra-mini on iOS here to see a to see a working example of how inference can be incorporated on-device.(https://github.com/tsperes/palmyra-mini-mobile/)
The Palmyra-mini family from Writer includes three lightweight models designed for high performance and efficient inference. These models are ideal for developers looking to integrate AI capabilities without excessive computational overhead.
Model Variants
* palmyra-mini: A base model for general-purpose generative tasks, achieving 52.6% on Big Bench Hard (exact match).
* palmyra-mini-thinking-a: Optimized for complex logical reasoning with a Chain of Thought (CoT) approach, scoring 82.87% on GSM8K (strict match).
* palmyra-mini-thinking-b: Specialized for mathematical reasoning, achieving 92.5% on AMC23.
Technical Details
* All models are based on the Qwen architecture, compatible with popular inference frameworks like vLLM, SGLang, and TGI.
* "Thinking" models utilize CoT training for enhanced reasoning capabilities.
* GGUF and MLX quantizations are available for optimized performance.
For more information, including benchmark methodologies and detailed performance metrics, refer to our blog post: (https://huggingface.co/blog/Writer/announcing-palmyra-mini).
Model repos can be found here:
* Writer/palmyra-mini
* Writer/palmyra-mini-thinking-a
* Writer/palmyra-mini-thinking-b
Also check out a mobile implementation of palmyra-mini on iOS here to see a to see a working example of how inference can be incorporated on-device.(https://github.com/tsperes/palmyra-mini-mobile/)