File size: 7,836 Bytes
addd874
 
 
 
 
 
 
 
 
 
 
 
 
 
5f2fe25
addd874
 
 
 
 
 
 
 
 
 
 
c13cbb5
 
addd874
 
5692c64
 
addd874
 
af9aeed
addd874
 
 
af9aeed
addd874
 
 
 
 
19eabac
 
addd874
 
 
 
 
 
af9aeed
addd874
 
 
 
 
 
 
79b2887
 
 
 
af9aeed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79b2887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af9aeed
 
 
 
 
 
 
 
 
 
 
 
79b2887
 
 
 
 
 
 
 
 
 
 
addd874
 
 
 
 
 
 
af9aeed
addd874
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
tags:
- language
- aquif
- text-generation-inference
- reasoning
- math
- coding
- frontier
- aquif-3.5
- moe
language:
- en
- de
- it
- pt
- fr
- hi
- es
- th
- zh
- ja
base_model:
- Qwen/Qwen3-30B-A3B-Instruct-2507
---

<small>*Disclaimer: aquif-3.5-Plus was made with the merging technique used in Qwen3-30B-A3B-YOYO-V3 to enable hybrid reasoning. aquif-3.5-Max, finetuned from Plus and expanded through DavidAU's brainstorm technique, went through further RL and SFT to focus more on the reasoning and coding aspects of it.*</small> 

# aquif-3.5-Plus & aquif-3.5-Max

The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities, hybrid reasoning modes and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

**aquif-3.5-Plus** combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

**aquif-3.5-Max** represents frontier model capabilities built on top of Plus's architecture, delivering exceptional performance across all benchmark categories.

## Model Repository Links

| Model | HuggingFace Repository |
|-------|----------------------|
| aquif-3.5-Plus | [aquif-ai/aquif-3.5-Plus](https://huggingface.co/aquif-ai/aquif-3.5-Plus-30B-A3B) |
| aquif-3.5-Max | [aquif-ai/aquif-3.5-Max](https://huggingface.co/aquif-ai/aquif-3.5-Max-42B-A3B) |

## Model Overview

| Model | Total (B) | Active Params (B) | Reasoning | Context Window | Thinking Modes |
|-------|-----------|-------------------|-----------|-----------------|----------------|
| aquif-3.5-Plus | 30.5 | 3.3 | ✅ Hybrid | 1M | ✅ Interchangeable |
| aquif-3.5-Max | 42.4 | 3.3 | ✅ Reasoning-Only | 1M | ✅ Interchangeable |

## Model Details

### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

## Artificial Analysis Intelligence Index (AAII) Benchmarks

### Core Performance Metrics

| Benchmark                | Plus (Non-Reasoning) | Plus (Reasoning) | Max (Non-Reasoning) | Max (Reasoning) |
| :----------------------- | -------------------: | ---------------: | ------------------: | --------------: |
| MMLU-Pro                 |                 80.2 |             82.8 |                82.8 |            85.4 |
| GPQA Diamond             |                 72.1 |             79.7 |                75.6 |            83.2 |
| AIME 2025                |                 64.7 |             90.3 |                69.0 |            94.6 |
| LiveCodeBench            |                 50.5 |             76.4 |                55.9 |            81.6 |
| Humanity’s Last Exam     |                  4.3 |             12.1 |                 7.8 |            15.6 |
| TAU2-Telecom             |                 34.2 |             41.5 |                43.2 |            51.3 |
| IFBench                  |                 39.3 |             54.3 |                49.3 |            65.4 |
| TerminalBench-Hard       |                 10.1 |             15.2 |                18.0 |            23.9 |
| AA-LCR                   |                 30.4 |             59.9 |                31.7 |            61.2 |
| SciCode                  |                 29.5 |             35.7 |                34.7 |            40.9 |
| **AAII Composite Score** |        **42 (41.5)** |    **55 (54.8)** |       **47 (46.8)** |   **60 (60.3)** |

### Long Context Evals (RULER)
| Model Name                 |  Acc avg |    4k |    8k |  16k |  32k |  64k |  96k | 128k | 192k | 256k | 384k | 512k | 640k | 768k | 896k | 1000k |
| :------------------------- | -------: | ----: | ----: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ----: |
| aquif-3.5-Plus (Reasoning) | **91.4** |  99.6 | 100.0 | 99.2 | 98.2 | 97.4 | 96.8 | 96.8 | 94.8 | 89.6 | 90.2 | 84.0 | 82.6 | 81.9 | 80.1 |  77.5 |
| aquif-3.5-Max (Reasoning)  | **92.1** | 100.0 | 100.0 | 99.7 | 98.5 | 97.8 | 97.1 | 96.9 | 95.8 | 92.1 | 91.1 | 85.5 | 84.8 | 80.0 | 79.9 |  79.6 |


### Comparable Models by Configuration

**aquif-3.5-Plus (Non-Reasoning) — AAII 42**

| Model | AAII Score |
|-------|-----------|
| GPT-5 mini | 42 |
| Claude Haiku 4.5 | 42 |
| Gemini 2.5 Flash Lite 2509 | 42 |
| **aquif-3.5-Plus (Non-Reasoning)** | **42** |
| DeepSeek V3 0324 | 41 |
| Qwen3 VL 32B Instruct | 41 |
| Qwen3 Coder 480B A35B | 42 |

**aquif-3.5-Plus (Reasoning) — AAII 55**

| Model | AAII Score |
|-------|-----------|
| GLM-4.6 | 56 |
| Gemini 2.5 Flash 2509 | 54 |
| Claude Haiku 4.5 | 55 |
| **aquif-3.5-Plus (Reasoning)** | **55** |
| Qwen3 Next 80B A3B | 54 |

**aquif-3.5-Max (Non-Reasoning) — AAII 47**

| Model | AAII Score |
|-------|-----------|
| Gemini 2.5 Flash 2509 | 47 |
| **aquif-3.5-Max (Non-Reasoning)** | **47** |
| DeepSeek-V3.2 Exp | 46 |
| Ling-1T | 45 |
| GLM-4.6 | 45 |
| Qwen3 235B A22B 2507 | 45 |

**aquif-3.5-Max (Reasoning) — AAII 60**

| Model | AAII Score |
|-------|-----------|
| Gemini 2.5 Pro | 60 |
| Grok 4 Fast | 60 |
| **aquif-3.5-Max** | **60** |
| MiniMax-M2 | 61 |
| gpt-oss-120B high | 61 |
| GPT-5 mini | 61 |
| DeepSeek-V3.1-Terminus | 58 |
| Claude Opus 4.1 | 59 |

## Key Features

**Massive Context Windows**: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

**Efficient Architecture**: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

**Flexible Reasoning**: aquif-3.5-Plus and Max provide interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

**Multilingual Support**: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

## Usage Recommendations

**aquif-3.5-Plus:**
- Complex reasoning requiring flexibility between speed and depth
- Scientific analysis and mathematical problem-solving with thinking enabled
- Rapid-response applications with thinking disabled
- Code generation and review
- Multilingual applications up to 1M token contexts

**aquif-3.5-Max:**
- Frontier-level problem-solving without compromise
- Advanced research and scientific computing
- Competition mathematics and algorithmic challenges
- Comprehensive code analysis and generation
- Complex multilingual tasks requiring maximum reasoning capability

## Setting Thinking Mode (aquif-3.5-Plus)

Toggle between thinking and non-thinking modes by modifying the chat template:

```
set thinking = true    # Enable reasoning mode
set thinking = false   # Disable thinking mode (faster inference)
```

Simply set the variable in your chat template before inference to switch modes. No model reloading required.

## Technical Specifications

Both models support:
- BF16 and FP16 precision
- Mixture of Experts architecture optimizations
- Efficient attention mechanisms with optimized KV caching
- Up to 1M token context window
- Multi-head attention with sparse routing

## Acknowledgements

- **Qwen Team**: Base architecture contributions
- **Meta Llama Team**: Core model foundations
- **Hugging Face**: Model hosting and training infrastructure

## License

This project is released under the Apache 2.0 License. See LICENSE file for details.

---

*Made in 🇧🇷*

© 2025 aquif AI. All rights reserved.