|
|
--- |
|
|
pipeline_tag: text-generation |
|
|
inference: false |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- language |
|
|
- aquif |
|
|
- text-generation-inference |
|
|
- reasoning |
|
|
- math |
|
|
- coding |
|
|
- frontier |
|
|
- aquif-3.5 |
|
|
- moe |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- it |
|
|
- pt |
|
|
- fr |
|
|
- hi |
|
|
- es |
|
|
- th |
|
|
- zh |
|
|
- ja |
|
|
base_model: |
|
|
- Qwen/Qwen3-30B-A3B-Instruct-2507 |
|
|
--- |
|
|
|
|
|
<small>*Disclaimer: aquif-3.5-Plus was made with the merging technique used in Qwen3-30B-A3B-YOYO-V3 to enable hybrid reasoning. aquif-3.5-Max, finetuned from Plus and expanded through DavidAU's brainstorm technique, went through further RL and SFT to focus more on the reasoning and coding aspects of it.*</small> |
|
|
|
|
|
# aquif-3.5-Plus & aquif-3.5-Max |
|
|
|
|
|
The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities, hybrid reasoning modes and unprecedented context windows to achieve state-of-the-art performance for their respective categories. |
|
|
|
|
|
**aquif-3.5-Plus** combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications. |
|
|
|
|
|
**aquif-3.5-Max** represents frontier model capabilities built on top of Plus's architecture, delivering exceptional performance across all benchmark categories. |
|
|
|
|
|
## Model Repository Links |
|
|
|
|
|
| Model | HuggingFace Repository | |
|
|
|-------|----------------------| |
|
|
| aquif-3.5-Plus | [aquif-ai/aquif-3.5-Plus](https://huggingface.co/aquif-ai/aquif-3.5-Plus-30B-A3B) | |
|
|
| aquif-3.5-Max | [aquif-ai/aquif-3.5-Max](https://huggingface.co/aquif-ai/aquif-3.5-Max-42B-A3B) | |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
| Model | Total (B) | Active Params (B) | Reasoning | Context Window | Thinking Modes | |
|
|
|-------|-----------|-------------------|-----------|-----------------|----------------| |
|
|
| aquif-3.5-Plus | 30.5 | 3.3 | ✅ Hybrid | 1M | ✅ Interchangeable | |
|
|
| aquif-3.5-Max | 42.4 | 3.3 | ✅ Reasoning-Only | 1M | ✅ Interchangeable | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes) |
|
|
|
|
|
A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications. |
|
|
|
|
|
## Artificial Analysis Intelligence Index (AAII) Benchmarks |
|
|
|
|
|
### Core Performance Metrics |
|
|
|
|
|
| Benchmark | Plus (Non-Reasoning) | Plus (Reasoning) | Max (Non-Reasoning) | Max (Reasoning) | |
|
|
| :----------------------- | -------------------: | ---------------: | ------------------: | --------------: | |
|
|
| MMLU-Pro | 80.2 | 82.8 | 82.8 | 85.4 | |
|
|
| GPQA Diamond | 72.1 | 79.7 | 75.6 | 83.2 | |
|
|
| AIME 2025 | 64.7 | 90.3 | 69.0 | 94.6 | |
|
|
| LiveCodeBench | 50.5 | 76.4 | 55.9 | 81.6 | |
|
|
| Humanity’s Last Exam | 4.3 | 12.1 | 7.8 | 15.6 | |
|
|
| TAU2-Telecom | 34.2 | 41.5 | 43.2 | 51.3 | |
|
|
| IFBench | 39.3 | 54.3 | 49.3 | 65.4 | |
|
|
| TerminalBench-Hard | 10.1 | 15.2 | 18.0 | 23.9 | |
|
|
| AA-LCR | 30.4 | 59.9 | 31.7 | 61.2 | |
|
|
| SciCode | 29.5 | 35.7 | 34.7 | 40.9 | |
|
|
| **AAII Composite Score** | **42 (41.5)** | **55 (54.8)** | **47 (46.8)** | **60 (60.3)** | |
|
|
|
|
|
### Long Context Evals (RULER) |
|
|
| Model Name | Acc avg | 4k | 8k | 16k | 32k | 64k | 96k | 128k | 192k | 256k | 384k | 512k | 640k | 768k | 896k | 1000k | |
|
|
| :------------------------- | -------: | ----: | ----: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ----: | |
|
|
| aquif-3.5-Plus (Reasoning) | **91.4** | 99.6 | 100.0 | 99.2 | 98.2 | 97.4 | 96.8 | 96.8 | 94.8 | 89.6 | 90.2 | 84.0 | 82.6 | 81.9 | 80.1 | 77.5 | |
|
|
| aquif-3.5-Max (Reasoning) | **92.1** | 100.0 | 100.0 | 99.7 | 98.5 | 97.8 | 97.1 | 96.9 | 95.8 | 92.1 | 91.1 | 85.5 | 84.8 | 80.0 | 79.9 | 79.6 | |
|
|
|
|
|
|
|
|
### Comparable Models by Configuration |
|
|
|
|
|
**aquif-3.5-Plus (Non-Reasoning) — AAII 42** |
|
|
|
|
|
| Model | AAII Score | |
|
|
|-------|-----------| |
|
|
| GPT-5 mini | 42 | |
|
|
| Claude Haiku 4.5 | 42 | |
|
|
| Gemini 2.5 Flash Lite 2509 | 42 | |
|
|
| **aquif-3.5-Plus (Non-Reasoning)** | **42** | |
|
|
| DeepSeek V3 0324 | 41 | |
|
|
| Qwen3 VL 32B Instruct | 41 | |
|
|
| Qwen3 Coder 480B A35B | 42 | |
|
|
|
|
|
**aquif-3.5-Plus (Reasoning) — AAII 55** |
|
|
|
|
|
| Model | AAII Score | |
|
|
|-------|-----------| |
|
|
| GLM-4.6 | 56 | |
|
|
| Gemini 2.5 Flash 2509 | 54 | |
|
|
| Claude Haiku 4.5 | 55 | |
|
|
| **aquif-3.5-Plus (Reasoning)** | **55** | |
|
|
| Qwen3 Next 80B A3B | 54 | |
|
|
|
|
|
**aquif-3.5-Max (Non-Reasoning) — AAII 47** |
|
|
|
|
|
| Model | AAII Score | |
|
|
|-------|-----------| |
|
|
| Gemini 2.5 Flash 2509 | 47 | |
|
|
| **aquif-3.5-Max (Non-Reasoning)** | **47** | |
|
|
| DeepSeek-V3.2 Exp | 46 | |
|
|
| Ling-1T | 45 | |
|
|
| GLM-4.6 | 45 | |
|
|
| Qwen3 235B A22B 2507 | 45 | |
|
|
|
|
|
**aquif-3.5-Max (Reasoning) — AAII 60** |
|
|
|
|
|
| Model | AAII Score | |
|
|
|-------|-----------| |
|
|
| Gemini 2.5 Pro | 60 | |
|
|
| Grok 4 Fast | 60 | |
|
|
| **aquif-3.5-Max** | **60** | |
|
|
| MiniMax-M2 | 61 | |
|
|
| gpt-oss-120B high | 61 | |
|
|
| GPT-5 mini | 61 | |
|
|
| DeepSeek-V3.1-Terminus | 58 | |
|
|
| Claude Opus 4.1 | 59 | |
|
|
|
|
|
## Key Features |
|
|
|
|
|
**Massive Context Windows**: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation. |
|
|
|
|
|
**Efficient Architecture**: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B. |
|
|
|
|
|
**Flexible Reasoning**: aquif-3.5-Plus and Max provide interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks. |
|
|
|
|
|
**Multilingual Support**: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese. |
|
|
|
|
|
## Usage Recommendations |
|
|
|
|
|
**aquif-3.5-Plus:** |
|
|
- Complex reasoning requiring flexibility between speed and depth |
|
|
- Scientific analysis and mathematical problem-solving with thinking enabled |
|
|
- Rapid-response applications with thinking disabled |
|
|
- Code generation and review |
|
|
- Multilingual applications up to 1M token contexts |
|
|
|
|
|
**aquif-3.5-Max:** |
|
|
- Frontier-level problem-solving without compromise |
|
|
- Advanced research and scientific computing |
|
|
- Competition mathematics and algorithmic challenges |
|
|
- Comprehensive code analysis and generation |
|
|
- Complex multilingual tasks requiring maximum reasoning capability |
|
|
|
|
|
## Setting Thinking Mode (aquif-3.5-Plus) |
|
|
|
|
|
Toggle between thinking and non-thinking modes by modifying the chat template: |
|
|
|
|
|
``` |
|
|
set thinking = true # Enable reasoning mode |
|
|
set thinking = false # Disable thinking mode (faster inference) |
|
|
``` |
|
|
|
|
|
Simply set the variable in your chat template before inference to switch modes. No model reloading required. |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
Both models support: |
|
|
- BF16 and FP16 precision |
|
|
- Mixture of Experts architecture optimizations |
|
|
- Efficient attention mechanisms with optimized KV caching |
|
|
- Up to 1M token context window |
|
|
- Multi-head attention with sparse routing |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
- **Qwen Team**: Base architecture contributions |
|
|
- **Meta Llama Team**: Core model foundations |
|
|
- **Hugging Face**: Model hosting and training infrastructure |
|
|
|
|
|
## License |
|
|
|
|
|
This project is released under the Apache 2.0 License. See LICENSE file for details. |
|
|
|
|
|
--- |
|
|
|
|
|
*Made in 🇧🇷* |
|
|
|
|
|
© 2025 aquif AI. All rights reserved. |