--- license: cc-by-4.0 language: - en base_model: - nvidia/OpenReasoning-Nemotron-1.5B pipeline_tag: text-generation library_name: transformers tags: - Reasoning - quantized - qwen - nvidia --- # Quantized OpenReasoning-Nemotron-1.5B Models This repository provides quantized GGUF versions of the OpenReasoning-Nemotron-1.5B model. These 4-bit and 5-bit quantized variants retain the original model’s strengths in multimodal medical reasoning, while reducing memory and compute requirements—ideal for efficient inference on resource-constrained devices. ## Model Overview - **Original Model**: OpenReasoning-Nemotron-1.5B - **Quantized Versions**: - Q4_K_M (4-bit quantization) - Q5_K_M (5-bit quantization) - **Architecture**: Decoder-only transformer - **Base Model**: Qwen2.5-1.5B-Instruct - **Modalities**: Text only - **License**: GOVERNING TERMS: Use of the original models and above listed models are governed by the [Creative Commons Attribution 4.0 International License (CC-BY-4.0).](https://creativecommons.org/licenses/by/4.0/legalcode.en) ADDITIONAL INFORMATION: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE) - **Language**: English ## Quantization Details ### Q4_K_M Version - Approx. ~70% size reduction - Lower memory footprint (~940 MB) - Best suited for deployment on edge devices or low-resource GPUs - Slight performance degradation in complex reasoning scenarios ### Q5_K_M Version - Approx. ~67% size reduction - Higher fidelity (~1.04 GB) - Better performance retention, recommended when quality is a priority ## Key Features - Expert-level reasoning capabilities across math, code, and scientific domains - Text-only instruction-following model optimized for multi-turn scientific question answering - Derived from Qwen2.5-1.5B-Instruct, further post-trained by NVIDIA on OpenReasoning datasets - Supports long-context inference with generation lengths of up to 64K tokens ### Usage This model is intended for developers and researchers who work on competitive math, code and science problems. It has been trained via only supervised fine-tuning to achieve strong scores on benchmarks. **llama.cpp (text-only)** ```sh ./llama-cli -hf SandLogicTechnologies/Openreasoning-Nemotron-1.5B-GGUF -p "What are the pointers" ``` ## Model Data ### Dataset Overview The original Qwen2.5-1.5B-Instruct model is built on top of the Qwen architecture and Post-trained on OpenReasoning datasets by NVIDIA: - **LLM Component**: Trained on diverse OpenReasoning datasets related to the above domains, including Science reports, Reasoning datasets, and Mathamatics and Coding datasets. ## Recommended Use Cases These quantized models are optimized for efficient inference while Maintaining Coding and mathamathics capabilities. Suggested use cases include: - **Scientific question answering** Scientific Research and mathamatics concepts, coding lessions , etc. - **Chatbot and assistant prototypes** Build interactive reasoning chat systems with coding capabilities. - **Research & fine-tuning** Serve as a lightweight base for further task-specific tuning in coding. - **Low-resource deployment** Run reasoning models on CPUs, edge devices, and lightweight GPUs. --- ## Acknowledgments These quantized models are based on the original work by **Qwen** and the **NVIDIA** development team. Special thanks to: - The [Nvidia](https://huggingface.co/nvidia) team for developing and releasing the [OpenReasoning-Nemotron-1.5B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B) model. - **Georgi Gerganov** and the entire [`llama.cpp`](https://github.com/ggerganov/llama.cpp) open-source community for enabling efficient model quantization and inference via the GGUF format. --- ## Contact For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).