aquif-3.5-Plus & aquif-3.5-Max

The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

aquif-3.5-Plus combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

aquif-3.5-Max represents frontier model capabilities with reasoning-only architecture, delivering exceptional performance across all benchmark categories.

Model Repository Links

Model HuggingFace Repository
aquif-3.5-Plus aquiffoo/aquif-3.5-Plus
aquif-3.5-Max aquiffoo/aquif-3.5-Max

Model Overview

Model Total (B) Active Params (B) Reasoning Context Window Thinking Modes
aquif-3.5-Plus 30.5 3.3 ✅ Hybrid 1M ✅ Interchangeable
aquif-3.5-Max 42.4 3.3 ✅ Reasoning-Only 1M Reasoning-Only

Model Details

aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

Artificial Analysis Intelligence Index (AAII) Benchmarks

Core Performance Metrics

Benchmark aquif-3.5-Plus (Non-Reasoning) aquif-3.5-Plus (Reasoning) aquif-3.5-Max
MMLU-Pro 80.2 82.8 85.4
GPQA Diamond 72.1 79.7 83.2
AIME 2025 64.7 90.3 94.6
LiveCodeBench 50.5 76.4 81.6
Humanity's Last Exam 4.3 12.1 15.6
TAU2-Telecom 34.2 41.5 51.3
IFBench 39.3 54.3 65.4
TerminalBench-Hard 10.1 15.2 23.9
AA-LCR 30.4 59.9 61.2
SciCode 29.5 35.7 40.9
AAII Composite Score 42 (41.53) 55 (54.79) 60 (60.31)

Comparable Models by Configuration

aquif-3.5-Plus (Non-Reasoning) — AAII 42

Model AAII Score
GPT-5 mini 42
Claude Haiku 4.5 42
Gemini 2.5 Flash Lite 2509 42
aquif-3.5-Plus (Non-Reasoning) 42
DeepSeek V3 0324 41
Qwen3 VL 32B Instruct 41
Qwen3 Coder 480B A35B 42

aquif-3.5-Plus (Reasoning) — AAII 55

Model AAII Score
GLM-4.6 56
Gemini 2.5 Flash 2509 54
Claude Haiku 4.5 55
aquif-3.5-Plus (Reasoning) 55
Qwen3 Next 80B A3B 54

aquif-3.5-Max — AAII 60

Model AAII Score
Gemini 2.5 Pro 60
Grok 4 Fast 60
aquif-3.5-Max 60
MiniMax-M2 61
gpt-oss-120B high 61
GPT-5 mini 61
DeepSeek-V3.1-Terminus 58
Claude Opus 4.1 59

Key Features

Massive Context Windows: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

Efficient Architecture: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

Flexible Reasoning (Plus Only): aquif-3.5-Plus provides interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

Multilingual Support: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

Usage Recommendations

aquif-3.5-Plus:

  • Complex reasoning requiring flexibility between speed and depth
  • Scientific analysis and mathematical problem-solving with thinking enabled
  • Rapid-response applications with thinking disabled
  • Code generation and review
  • Multilingual applications up to 1M token contexts

aquif-3.5-Max:

  • Frontier-level problem-solving without compromise
  • Advanced research and scientific computing
  • Competition mathematics and algorithmic challenges
  • Comprehensive code analysis and generation
  • Complex multilingual tasks requiring maximum reasoning capability

Setting Thinking Mode (aquif-3.5-Plus)

Toggle between thinking and non-thinking modes by modifying the chat template:

set thinking = true    # Enable reasoning mode
set thinking = false   # Disable thinking mode (faster inference)

Simply set the variable in your chat template before inference to switch modes. No model reloading required.

Technical Specifications

Both models support:

  • BF16 and FP16 precision
  • Mixture of Experts architecture optimizations
  • Efficient attention mechanisms with optimized KV caching
  • Up to 1M token context window
  • Multi-head attention with sparse routing

Performance Highlights

aquif-3.5-Plus achieves 82.3% average benchmark performance in thinking mode, surpassing models with 2-4x more total parameters. Non-thinking mode maintains competitive 66.9% performance for latency-sensitive applications.

aquif-3.5-Max reaches 86.2% average performance, matching or exceeding frontier models while maintaining 42.4B total parameters—an extraordinary efficiency breakthrough.

Acknowledgements

  • Qwen Team: Base architecture contributions
  • Meta Llama Team: Core model foundations
  • Hugging Face: Model hosting and training infrastructure

License

This project is released under the Apache 2.0 License. See LICENSE file for details.


Made in 🇧🇷

© 2025 aquif AI. All rights reserved.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including aquif-ai/aquif-3.5-Plus-30B-A3B