neuralcrew
/

neutrino-instruct

Text Generation

instruction-following

Model card Files Files and versions

neutrino-instruct / README.md

neuralcrew's picture

Updated Banner

6b6b057 verified about 2 months ago

|

history blame contribute delete

3.41 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- conversational
	- instruction-following
	- chat
	- gguf
	- llama.cpp
	- ollama
	- local-llm
	- Neutrino
	pipeline_tag: text-generation
	datasets:
	- HuggingFaceFW/finepdfs
	- fka/awesome-chatgpt-prompts
	metrics:
	- accuracy
	- bertscore
	- bleu
	- bleurt
	- brier_score
	- cer
	library_name: adapter-transformers
	---

	# 🧠 Neutrino-Instruct (7B)
	![Alt text](https://ollama.com/assets/fardeen0424/neutrino/742305f0-8c9e-4ae8-acff-a7c2b133d3d8)
	Neutrino-Instruct is a 7B parameter instruction-tuned LLM developed by Fardeen NB.
	It is designed for conversational AI, multi-step reasoning, and instruction-following tasks, fine-tuned to maintain coherent and contextual dialogue across multiple turns.

	## ✨ Model Details

	- Model Name: Neutrino-Instruct
	- Developer: Fardeen NB
	- License: Apache-2.0
	- Language(s): English
	- Format: GGUF (optimized for `llama.cpp` and `Ollama`)
	- Base Model: Neutrino
	- Version: 2.0
	- Task: Text Generation (chat, Q&A, instruction-following)

	## 🚀 Quick Start

	### Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)

	```bash
	# Clone and build llama.cpp
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp && make

	# Run a single prompt
	./main -m ./neutrino-instruct.gguf -p "Hello, who are you?"

	# Run in interactive mode
	./main -m ./neutrino-instruct.gguf -i -p "Let's chat."

	# Control output length
	./main -m ./neutrino-instruct.gguf -n 256 -p "Write a poem about stars."

	# Change creativity (temperature)
	./main -m ./neutrino-instruct.gguf --temp 0.7 -p "Explain quantum computing simply."

	# Enable GPU acceleration (if compiled with CUDA/Metal)
	./main -m ./neutrino-instruct.gguf --gpu-layers 50 -p "Summarize this article."
	```

	### Run with [Ollama](https://ollama.com/fardeen0424/neutrino)

	```bash
	ollama run fardeen0424/neutrino
	```

	### Run in Python (`llama-cpp-python`)

	```python
	from llama_cpp import Llama

	# Load the Neutrino-Instruct model
	llm = Llama(model_path="./neutrino-instruct.gguf")

	# Run inference
	response = llm("Who are you?")
	print(response["choices"][0]["text"])

	# Stream output tokens
	for token in llm("Tell me a story about Neutrino:", stream=True):
	print(token["choices"][0]["text"], end="", flush=True)
	```

	## 📊 System Requirements

	* CPU-only: 32–64GB RAM recommended (runs on modern laptops, slower inference).
	* GPU acceleration:

	* 4GB VRAM → 4-bit quantized (Q4) models
	* 8GB VRAM → 5-bit/8-bit models
	* 12GB+ VRAM → FP16 full precision


	## 🧩 Potential Use Cases

	* Conversational AI assistants
	* Research prototypes
	* Instruction-following agents
	* Chatbots with identity-awareness

	⚠️ Out of Scope: Use in critical decision-making, legal, or medical contexts.

	## 🛠️ Development Notes

	* Model uploaded in GGUF format for portability & performance.
	* Compatible with llama.cpp, Ollama, and llama-cpp-python.
	* Supports quantization levels (Q4, Q5, Q8) for deployment on resource-constrained devices.

	## 📖 Citation

	If you use Neutrino in your research or projects, please cite:

	```bibtex
	@misc{fardeennb2025neutrino,
	title = {Neutrino-Instruct: A 7B Instruction-Tuned Conversational Model},
	author = {Fardeen NB},
	year = {2025},
	howpublished = {Hugging Face},
	url = {https://huggingface.co/neuralcrew/neutrino-instruct}
	}
	```