aquif-3.5-Plus & aquif-3.5-Max
The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities and unprecedented context windows to achieve state-of-the-art performance for their respective categories.
aquif-3.5-Plus combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.
aquif-3.5-Max represents frontier model capabilities with reasoning-only architecture, delivering exceptional performance across all benchmark categories.
Model Repository Links
| Model | HuggingFace Repository |
|---|---|
| aquif-3.5-Plus | aquiffoo/aquif-3.5-Plus |
| aquif-3.5-Max | aquiffoo/aquif-3.5-Max |
Model Overview
| Model | Total (B) | Active Params (B) | Reasoning | Context Window | Thinking Modes |
|---|---|---|---|---|---|
| aquif-3.5-Plus | 30.5 | 3.3 | ✅ Hybrid | 1M | ✅ Interchangeable |
| aquif-3.5-Max | 42.4 | 3.3 | ✅ Reasoning-Only | 1M | Reasoning-Only |
Model Details
aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)
A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
Artificial Analysis Intelligence Index (AAII) Benchmarks
Core Performance Metrics
| Benchmark | aquif-3.5-Plus (Non-Reasoning) | aquif-3.5-Plus (Reasoning) | aquif-3.5-Max |
|---|---|---|---|
| MMLU-Pro | 80.2 | 82.8 | 85.4 |
| GPQA Diamond | 72.1 | 79.7 | 83.2 |
| AIME 2025 | 64.7 | 90.3 | 94.6 |
| LiveCodeBench | 50.5 | 76.4 | 81.6 |
| Humanity's Last Exam | 4.3 | 12.1 | 15.6 |
| TAU2-Telecom | 34.2 | 41.5 | 51.3 |
| IFBench | 39.3 | 54.3 | 65.4 |
| TerminalBench-Hard | 10.1 | 15.2 | 23.9 |
| AA-LCR | 30.4 | 59.9 | 61.2 |
| SciCode | 29.5 | 35.7 | 40.9 |
| AAII Composite Score | 42 (41.53) | 55 (54.79) | 60 (60.31) |
Comparable Models by Configuration
aquif-3.5-Plus (Non-Reasoning) — AAII 42
| Model | AAII Score |
|---|---|
| GPT-5 mini | 42 |
| Claude Haiku 4.5 | 42 |
| Gemini 2.5 Flash Lite 2509 | 42 |
| aquif-3.5-Plus (Non-Reasoning) | 42 |
| DeepSeek V3 0324 | 41 |
| Qwen3 VL 32B Instruct | 41 |
| Qwen3 Coder 480B A35B | 42 |
aquif-3.5-Plus (Reasoning) — AAII 55
| Model | AAII Score |
|---|---|
| GLM-4.6 | 56 |
| Gemini 2.5 Flash 2509 | 54 |
| Claude Haiku 4.5 | 55 |
| aquif-3.5-Plus (Reasoning) | 55 |
| Qwen3 Next 80B A3B | 54 |
aquif-3.5-Max — AAII 60
| Model | AAII Score |
|---|---|
| Gemini 2.5 Pro | 60 |
| Grok 4 Fast | 60 |
| aquif-3.5-Max | 60 |
| MiniMax-M2 | 61 |
| gpt-oss-120B high | 61 |
| GPT-5 mini | 61 |
| DeepSeek-V3.1-Terminus | 58 |
| Claude Opus 4.1 | 59 |
Key Features
Massive Context Windows: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.
Efficient Architecture: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.
Flexible Reasoning (Plus Only): aquif-3.5-Plus provides interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.
Multilingual Support: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.
Usage Recommendations
aquif-3.5-Plus:
- Complex reasoning requiring flexibility between speed and depth
- Scientific analysis and mathematical problem-solving with thinking enabled
- Rapid-response applications with thinking disabled
- Code generation and review
- Multilingual applications up to 1M token contexts
aquif-3.5-Max:
- Frontier-level problem-solving without compromise
- Advanced research and scientific computing
- Competition mathematics and algorithmic challenges
- Comprehensive code analysis and generation
- Complex multilingual tasks requiring maximum reasoning capability
Setting Thinking Mode (aquif-3.5-Plus)
Toggle between thinking and non-thinking modes by modifying the chat template:
set thinking = true # Enable reasoning mode
set thinking = false # Disable thinking mode (faster inference)
Simply set the variable in your chat template before inference to switch modes. No model reloading required.
Technical Specifications
Both models support:
- BF16 and FP16 precision
- Mixture of Experts architecture optimizations
- Efficient attention mechanisms with optimized KV caching
- Up to 1M token context window
- Multi-head attention with sparse routing
Performance Highlights
aquif-3.5-Plus achieves 82.3% average benchmark performance in thinking mode, surpassing models with 2-4x more total parameters. Non-thinking mode maintains competitive 66.9% performance for latency-sensitive applications.
aquif-3.5-Max reaches 86.2% average performance, matching or exceeding frontier models while maintaining 42.4B total parameters—an extraordinary efficiency breakthrough.
Acknowledgements
- Qwen Team: Base architecture contributions
- Meta Llama Team: Core model foundations
- Hugging Face: Model hosting and training infrastructure
License
This project is released under the Apache 2.0 License. See LICENSE file for details.
Made in 🇧🇷
© 2025 aquif AI. All rights reserved.