File size: 9,772 Bytes

279a648
0b7afc0
 
279a648
0b7afc0
 
 
0596804
0b7afc0
 
 
 
0596804
 
 
 
279a648
 
0b7afc0
 
 
279a648
0b7afc0
 
 
 
 
279a648
0b7afc0
279a648
0b7afc0
279a648
d819ae7
279a648
0b7afc0
 
 
 
 
 
 
 
279a648
2727d51
 
 
0736a8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0754b54
 
0736a8a
0b7afc0
 
279a648
0b7afc0
 
279a648
0b7afc0
 
 
 
 
279a648
0b7afc0
279a648
0b7afc0
279a648
fc30fee
279a648
0b7afc0
279a648
789c0c8
0b7afc0
 
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
 
 
279a648
0b7afc0
279a648
0b7afc0
 
 
279a648
0b7afc0
 
279a648
0b7afc0
 
 
279a648
0b7afc0
279a648
0b7afc0
 
279a648
0b7afc0
279a648
0b7afc0
 
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
 
 
279a648
0b7afc0
 
3da38cb
0b7afc0
279a648
0b7afc0
 
 
279a648
0b7afc0
279a648
0b7afc0
 
 
1f80d59
0b7afc0
 
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
279a648
0b7afc0
 
 
 
0596804

---
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- vllm
- stem
- merge
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-1.7B-Instruct
- omniomni/omni-0-science-preview
- omniomni/omni-0-technology-preview
- omniomni/omni-0-engineering-preview
- omniomni/omni-0-math-preview
---

<p align="center">
  <img alt="Omni 0 preview models header" src="https://github.com/omniomni-ai/omni-0-preview-models/raw/refs/heads/main/omni-0-preview-models-image.png">
</p>

<p align="center">
  <a href="https://github.com/omniomni-ai"><strong>GitHub</strong></a> &nbsp
   <a href="https://omniomni.framer.website/"><strong>Website</strong></a> &nbsp
    <strong>Paper (Coming Soon)</strong>
</p>

<br>

# Omni 0 Preview Models

Omni 0 preview models are a series of 1.7B parameter LLMs optimized for STEM knowledge consisting of 4 expert models and one final merged output between all experts. Experts include: [Science Expert](https://huggingface.co/omniomni/omni-0-science-preview),  [Technology Expert](https://huggingface.co/omniomni/omni-0-technology-preview), [Engineering Expert](https://huggingface.co/omniomni/omni-0-engineering-preview),  [Math Expert](https://huggingface.co/omniomni/omni-0-math-preview). Through DARE-TIES merging and expert finetuning, Omni is able to achieve state-of-the-art domain benchmark results among alternative optimization techniques, as well as optimal compute-knowledge efficiency across similar models, while also performing comparably with them in tested benchmarks.

<p align="center">
  <figure>
  <img src="  
https://github.com/omniomni-ai/omni-0-preview-models/raw/refs/heads/main/compute-knowledge-efficiency-plot.png"
       alt="Omni compute-knowledge efficiency plot">
  <figcaption>Omni achieves optimal compute-knowledge efficiency compared to alternative models</figcaption>
</figure>
</p>

>[!Note]
>This model card is for [omni-0-mini-preview](https://huggingface.co/omniomni/omni-0-mini-preview). All other models can be found on [Omni's HuggingFace page](https://huggingface.co/omniomni)

# Benchmarks
**Omni-0-mini-preview Benchmarks**
| **Benchmark**                           | **Omni** | **Base** | **Alternative Models** | **Llama 3.2 3B** | **Gemma 3 4B** | **Llama 3.1 8B** |
|-----------------------------------------|----------|----------|------------------------|------------------|----------------|------------------|
| MMLU STEM (4-shot CoT)                  | **35.02** | 26.59    |                        | 33.28            | 40.82          | 52.22            |
| MMLU Science (4-shot CoT)               | **34.44** | 28.03    |                        | 33.47            | 42.93          | 52.54            |
| MMLU Technology (4-shot CoT)            | **41.07** | 30.86    |                        | 45.28            | 46.74          | 63.72            |
| MMLU Engineering (4-shot CoT)           | **37.50** | 25.93    |                        | 34.65            | 43.66          | 55.58            |
| MMLU Math (4-shot CoT)                   | **35.54** | 23.86    |                        | 39.51            | 35.31          | 45.84            |
| HumanEval (pass@1)                      | **31.71** | 29.88    |                        | 51.83            | 57.32          | 57.93            |
| SciQ (0-shot)                           | **87.30** | 76.10    |                        | 93.30            | 87.50          | 91.80            |
| MATH (4-shot)                           | 15.66     | **16.12**|                        | 28.44            | 26.38          | 29.56            |
| ARC-Challenge (0-shot)                  | **43.00** | 40.10    |                        | 46.16            | 44.11          | 54.18            |
| ARC-Easy (0-shot)                       | **66.67** | 58.54    |                        | 67.93            | 63.01          | 75.80            |
| **Average**                             | **37.91** | 30.25    |                        | 38.33            | 43.91          | 54.22            |
| **Improvement**                         | **25.32%**| Base     |                        |                  |                |                  |

<br>

**Expert Model Benchmarks**
|Benchmark            | Science | Technology | Engineering | Math  |
|-------------------------------|------------------|---------------------|----------------------|----------------|
| MMLU Science (4-shot CoT)     | 26.69            | --                  | --                   | --             |
| SciQ (0-shot)                 | 85.80            | --                  | --                   | --             |
| ARC-Challenge (0-shot)        | 42.41            | --                  | --                   | --             |
| ARC-Easy (0-shot)             | 66.96            | --                  | --                   | --             |
| MMLU Technology (4-shot CoT)  | --               | 35.30               | --                   |                |
| Humaneval (pass@1)            | --               | 32.93               | --                   | --             |
| MMLU Engineering (4-shot CoT) | --               | --                  | 32.07                | --             |
| MMLU Math (4-shot CoT)        | --               | --                  | --                   | 30.83          |
| MATH (4-shot)                 | --               | --                  | --                   | 18.76          |
| Expert Average       | **36.28**   | **34.83**      | **32.07**       | **28.82** |
| Base Average         | 35.59            | 29.79               | 30.86                | 22.57          |
| Improvement          | 1.94\%           | 16.92\%             | 3.92\%               | 27.69\%        |
> [!Note]
> Expert average refers to the average of the expert model for the STEM domain in focus while benchmarking

# Models
Omni comes in a total of 5 models:

**Merged Model**
- `omni-0-mini-preview` - Merged output of all four experts through DARE-TIES, delivering large improvements in performance in STEM domains compared to its base.

**Experts**
- `omni-0-science-preview` - Science expert finetuned on corpora of scientific wikipedia texts and academic papers, as well as a chat-templated scientific Q&A dataset
- `omni-0-technology-preview` - Technology expert finetuned on chat-templated code generation data and stack exchange questions and top-voted answers
- `omni-0-engineering-preview` - Engineering expert finetuned on corpora of engineering-related wikipedia texts and academic papers
- `omni-0-math-preview` - Math expert finetuned on chat-templated math Q&A data

All Omni experts are finetuned from their base model: [SmolLM2 1.7B Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) on H100/Ada 6000/A100 GPUs, improving by 25.32% on average over all tested STEM benchmarks.

# Features

**Made for all** Omni is a series of highly efficient Large Language Models aimed at expanding the accessibility of AI, filling in its gaps among underserved populations.

**Efficient** Omni operates at optimal compute-knowledge efficiency compared to similar models.

**Merged architecture** Omni uses merging to provide the collective accuracy of specialized models, leveraging their capabilities to enhance the final merged model.
  
**Multi-disciplinary** Omni's first variant achieves state-of-the-art performance across STEM compared to alternative optimization techniques.

---

# Inference

## Transformers

Transformers is a framework by HuggingFace unifying model development and inference, allowing for simple and seamless interactions with models found on HuggingFace. 

To get started with running inference using Omni, install transformers

```bash
pip install transformers
```

After transformers has been installed, run the following code to generate outputs from any Omni model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "omniomni/omni-0-mini-preview" # Can be any Omni model

device = "cuda" # For GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# For multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
# However, Omni is small enough to run on individual commodity GPUs and low-resource devices
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is STEM?"}]

input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)

outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)

print(tokenizer.decode(outputs[0]))
```

## vLLM

vLLM provides fast LLM inference to models spanning vast amounts of architectures through both server and in-file implementations.

First, install vLLM's package

```bash
uv pip install vllm --torch-backend=auto
```

After that, run this command to start a server with Omni via vLLM
```bash
vllm serve omniomni/omni-0-mini-preview
```

To use Omni with vLLM without creating a server, run this code to generate outputs within a Python file
```python
# vLLM automatically uses a GPU unless built with CPU wheels, so no need to specify a device

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The answer to x^2 + 2x + 1 is",
    "The future of AI is",
]

sampling_params = SamplingParams(temperature=0.7)

llm = LLM(model="omniomni/omni-0-mini-preview")

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```