File size: 9,772 Bytes
279a648 0b7afc0 279a648 0b7afc0 0596804 0b7afc0 0596804 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 d819ae7 279a648 0b7afc0 279a648 2727d51 0736a8a 0754b54 0736a8a 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 fc30fee 279a648 0b7afc0 279a648 789c0c8 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 3da38cb 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 1f80d59 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 279a648 0b7afc0 0596804 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- vllm
- stem
- merge
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-1.7B-Instruct
- omniomni/omni-0-science-preview
- omniomni/omni-0-technology-preview
- omniomni/omni-0-engineering-preview
- omniomni/omni-0-math-preview
---
<p align="center">
<img alt="Omni 0 preview models header" src="https://github.com/omniomni-ai/omni-0-preview-models/raw/refs/heads/main/omni-0-preview-models-image.png">
</p>
<p align="center">
<a href="https://github.com/omniomni-ai"><strong>GitHub</strong></a>  
<a href="https://omniomni.framer.website/"><strong>Website</strong></a>  
<strong>Paper (Coming Soon)</strong>
</p>
<br>
# Omni 0 Preview Models
Omni 0 preview models are a series of 1.7B parameter LLMs optimized for STEM knowledge consisting of 4 expert models and one final merged output between all experts. Experts include: [Science Expert](https://huggingface.co/omniomni/omni-0-science-preview), [Technology Expert](https://huggingface.co/omniomni/omni-0-technology-preview), [Engineering Expert](https://huggingface.co/omniomni/omni-0-engineering-preview), [Math Expert](https://huggingface.co/omniomni/omni-0-math-preview). Through DARE-TIES merging and expert finetuning, Omni is able to achieve state-of-the-art domain benchmark results among alternative optimization techniques, as well as optimal compute-knowledge efficiency across similar models, while also performing comparably with them in tested benchmarks.
<p align="center">
<figure>
<img src="
https://github.com/omniomni-ai/omni-0-preview-models/raw/refs/heads/main/compute-knowledge-efficiency-plot.png"
alt="Omni compute-knowledge efficiency plot">
<figcaption>Omni achieves optimal compute-knowledge efficiency compared to alternative models</figcaption>
</figure>
</p>
>[!Note]
>This model card is for [omni-0-mini-preview](https://huggingface.co/omniomni/omni-0-mini-preview). All other models can be found on [Omni's HuggingFace page](https://huggingface.co/omniomni)
# Benchmarks
**Omni-0-mini-preview Benchmarks**
| **Benchmark** | **Omni** | **Base** | **Alternative Models** | **Llama 3.2 3B** | **Gemma 3 4B** | **Llama 3.1 8B** |
|-----------------------------------------|----------|----------|------------------------|------------------|----------------|------------------|
| MMLU STEM (4-shot CoT) | **35.02** | 26.59 | | 33.28 | 40.82 | 52.22 |
| MMLU Science (4-shot CoT) | **34.44** | 28.03 | | 33.47 | 42.93 | 52.54 |
| MMLU Technology (4-shot CoT) | **41.07** | 30.86 | | 45.28 | 46.74 | 63.72 |
| MMLU Engineering (4-shot CoT) | **37.50** | 25.93 | | 34.65 | 43.66 | 55.58 |
| MMLU Math (4-shot CoT) | **35.54** | 23.86 | | 39.51 | 35.31 | 45.84 |
| HumanEval (pass@1) | **31.71** | 29.88 | | 51.83 | 57.32 | 57.93 |
| SciQ (0-shot) | **87.30** | 76.10 | | 93.30 | 87.50 | 91.80 |
| MATH (4-shot) | 15.66 | **16.12**| | 28.44 | 26.38 | 29.56 |
| ARC-Challenge (0-shot) | **43.00** | 40.10 | | 46.16 | 44.11 | 54.18 |
| ARC-Easy (0-shot) | **66.67** | 58.54 | | 67.93 | 63.01 | 75.80 |
| **Average** | **37.91** | 30.25 | | 38.33 | 43.91 | 54.22 |
| **Improvement** | **25.32%**| Base | | | | |
<br>
**Expert Model Benchmarks**
|Benchmark | Science | Technology | Engineering | Math |
|-------------------------------|------------------|---------------------|----------------------|----------------|
| MMLU Science (4-shot CoT) | 26.69 | -- | -- | -- |
| SciQ (0-shot) | 85.80 | -- | -- | -- |
| ARC-Challenge (0-shot) | 42.41 | -- | -- | -- |
| ARC-Easy (0-shot) | 66.96 | -- | -- | -- |
| MMLU Technology (4-shot CoT) | -- | 35.30 | -- | |
| Humaneval (pass@1) | -- | 32.93 | -- | -- |
| MMLU Engineering (4-shot CoT) | -- | -- | 32.07 | -- |
| MMLU Math (4-shot CoT) | -- | -- | -- | 30.83 |
| MATH (4-shot) | -- | -- | -- | 18.76 |
| Expert Average | **36.28** | **34.83** | **32.07** | **28.82** |
| Base Average | 35.59 | 29.79 | 30.86 | 22.57 |
| Improvement | 1.94\% | 16.92\% | 3.92\% | 27.69\% |
> [!Note]
> Expert average refers to the average of the expert model for the STEM domain in focus while benchmarking
# Models
Omni comes in a total of 5 models:
**Merged Model**
- `omni-0-mini-preview` - Merged output of all four experts through DARE-TIES, delivering large improvements in performance in STEM domains compared to its base.
**Experts**
- `omni-0-science-preview` - Science expert finetuned on corpora of scientific wikipedia texts and academic papers, as well as a chat-templated scientific Q&A dataset
- `omni-0-technology-preview` - Technology expert finetuned on chat-templated code generation data and stack exchange questions and top-voted answers
- `omni-0-engineering-preview` - Engineering expert finetuned on corpora of engineering-related wikipedia texts and academic papers
- `omni-0-math-preview` - Math expert finetuned on chat-templated math Q&A data
All Omni experts are finetuned from their base model: [SmolLM2 1.7B Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) on H100/Ada 6000/A100 GPUs, improving by 25.32% on average over all tested STEM benchmarks.
# Features
**Made for all** Omni is a series of highly efficient Large Language Models aimed at expanding the accessibility of AI, filling in its gaps among underserved populations.
**Efficient** Omni operates at optimal compute-knowledge efficiency compared to similar models.
**Merged architecture** Omni uses merging to provide the collective accuracy of specialized models, leveraging their capabilities to enhance the final merged model.
**Multi-disciplinary** Omni's first variant achieves state-of-the-art performance across STEM compared to alternative optimization techniques.
---
# Inference
## Transformers
Transformers is a framework by HuggingFace unifying model development and inference, allowing for simple and seamless interactions with models found on HuggingFace.
To get started with running inference using Omni, install transformers
```bash
pip install transformers
```
After transformers has been installed, run the following code to generate outputs from any Omni model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "omniomni/omni-0-mini-preview" # Can be any Omni model
device = "cuda" # For GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# For multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
# However, Omni is small enough to run on individual commodity GPUs and low-resource devices
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{"role": "user", "content": "What is STEM?"}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0]))
```
## vLLM
vLLM provides fast LLM inference to models spanning vast amounts of architectures through both server and in-file implementations.
First, install vLLM's package
```bash
uv pip install vllm --torch-backend=auto
```
After that, run this command to start a server with Omni via vLLM
```bash
vllm serve omniomni/omni-0-mini-preview
```
To use Omni with vLLM without creating a server, run this code to generate outputs within a Python file
```python
# vLLM automatically uses a GPU unless built with CPU wheels, so no need to specify a device
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The answer to x^2 + 2x + 1 is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.7)
llm = LLM(model="omniomni/omni-0-mini-preview")
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
``` |