README.md · aquif-ai/aquif-3.5-Plus-30B-A3B at main

aquif-3.5-Plus-30B-A3B / README.md

aquiffoo

Update README.md

19eabac verified 7 days ago

preview code

raw

history blame contribute delete

7.84 kB

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	tags:
	- language
	- aquif
	- text-generation-inference
	- reasoning
	- math
	- coding
	- frontier
	- aquif-3.5
	- moe
	language:
	- en
	- de
	- it
	- pt
	- fr
	- hi
	- es
	- th
	- zh
	- ja
	base_model:
	- Qwen/Qwen3-30B-A3B-Instruct-2507
	---

	<small>Disclaimer: aquif-3.5-Plus was made with the merging technique used in Qwen3-30B-A3B-YOYO-V3 to enable hybrid reasoning. aquif-3.5-Max, finetuned from Plus and expanded through DavidAU's brainstorm technique, went through further RL and SFT to focus more on the reasoning and coding aspects of it.</small>

	# aquif-3.5-Plus & aquif-3.5-Max

	The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities, hybrid reasoning modes and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

	aquif-3.5-Plus combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

	aquif-3.5-Max represents frontier model capabilities built on top of Plus's architecture, delivering exceptional performance across all benchmark categories.

	## Model Repository Links

	\| Model \| HuggingFace Repository \|
	\|-------\|----------------------\|
	\| aquif-3.5-Plus \| [aquif-ai/aquif-3.5-Plus](https://huggingface.co/aquif-ai/aquif-3.5-Plus-30B-A3B) \|
	\| aquif-3.5-Max \| [aquif-ai/aquif-3.5-Max](https://huggingface.co/aquif-ai/aquif-3.5-Max-42B-A3B) \|

	## Model Overview

	\| Model \| Total (B) \| Active Params (B) \| Reasoning \| Context Window \| Thinking Modes \|
	\|-------\|-----------\|-------------------\|-----------\|-----------------\|----------------\|
	\| aquif-3.5-Plus \| 30.5 \| 3.3 \| ✅ Hybrid \| 1M \| ✅ Interchangeable \|
	\| aquif-3.5-Max \| 42.4 \| 3.3 \| ✅ Reasoning-Only \| 1M \| ✅ Interchangeable \|

	## Model Details

	### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

	A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

	## Artificial Analysis Intelligence Index (AAII) Benchmarks

	### Core Performance Metrics

	\| Benchmark \| Plus (Non-Reasoning) \| Plus (Reasoning) \| Max (Non-Reasoning) \| Max (Reasoning) \|
	\| :----------------------- \| -------------------: \| ---------------: \| ------------------: \| --------------: \|
	\| MMLU-Pro \| 80.2 \| 82.8 \| 82.8 \| 85.4 \|
	\| GPQA Diamond \| 72.1 \| 79.7 \| 75.6 \| 83.2 \|
	\| AIME 2025 \| 64.7 \| 90.3 \| 69.0 \| 94.6 \|
	\| LiveCodeBench \| 50.5 \| 76.4 \| 55.9 \| 81.6 \|
	\| Humanity’s Last Exam \| 4.3 \| 12.1 \| 7.8 \| 15.6 \|
	\| TAU2-Telecom \| 34.2 \| 41.5 \| 43.2 \| 51.3 \|
	\| IFBench \| 39.3 \| 54.3 \| 49.3 \| 65.4 \|
	\| TerminalBench-Hard \| 10.1 \| 15.2 \| 18.0 \| 23.9 \|
	\| AA-LCR \| 30.4 \| 59.9 \| 31.7 \| 61.2 \|
	\| SciCode \| 29.5 \| 35.7 \| 34.7 \| 40.9 \|
	\| AAII Composite Score \| 42 (41.5) \| 55 (54.8) \| 47 (46.8) \| 60 (60.3) \|

	### Long Context Evals (RULER)
	\| Model Name \| Acc avg \| 4k \| 8k \| 16k \| 32k \| 64k \| 96k \| 128k \| 192k \| 256k \| 384k \| 512k \| 640k \| 768k \| 896k \| 1000k \|
	\| :------------------------- \| -------: \| ----: \| ----: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ----: \|
	\| aquif-3.5-Plus (Reasoning) \| 91.4 \| 99.6 \| 100.0 \| 99.2 \| 98.2 \| 97.4 \| 96.8 \| 96.8 \| 94.8 \| 89.6 \| 90.2 \| 84.0 \| 82.6 \| 81.9 \| 80.1 \| 77.5 \|
	\| aquif-3.5-Max (Reasoning) \| 92.1 \| 100.0 \| 100.0 \| 99.7 \| 98.5 \| 97.8 \| 97.1 \| 96.9 \| 95.8 \| 92.1 \| 91.1 \| 85.5 \| 84.8 \| 80.0 \| 79.9 \| 79.6 \|


	### Comparable Models by Configuration

	aquif-3.5-Plus (Non-Reasoning) — AAII 42

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| GPT-5 mini \| 42 \|
	\| Claude Haiku 4.5 \| 42 \|
	\| Gemini 2.5 Flash Lite 2509 \| 42 \|
	\| aquif-3.5-Plus (Non-Reasoning) \| 42 \|
	\| DeepSeek V3 0324 \| 41 \|
	\| Qwen3 VL 32B Instruct \| 41 \|
	\| Qwen3 Coder 480B A35B \| 42 \|

	aquif-3.5-Plus (Reasoning) — AAII 55

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| GLM-4.6 \| 56 \|
	\| Gemini 2.5 Flash 2509 \| 54 \|
	\| Claude Haiku 4.5 \| 55 \|
	\| aquif-3.5-Plus (Reasoning) \| 55 \|
	\| Qwen3 Next 80B A3B \| 54 \|

	aquif-3.5-Max (Non-Reasoning) — AAII 47

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| Gemini 2.5 Flash 2509 \| 47 \|
	\| aquif-3.5-Max (Non-Reasoning) \| 47 \|
	\| DeepSeek-V3.2 Exp \| 46 \|
	\| Ling-1T \| 45 \|
	\| GLM-4.6 \| 45 \|
	\| Qwen3 235B A22B 2507 \| 45 \|

	aquif-3.5-Max (Reasoning) — AAII 60

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| Gemini 2.5 Pro \| 60 \|
	\| Grok 4 Fast \| 60 \|
	\| aquif-3.5-Max \| 60 \|
	\| MiniMax-M2 \| 61 \|
	\| gpt-oss-120B high \| 61 \|
	\| GPT-5 mini \| 61 \|
	\| DeepSeek-V3.1-Terminus \| 58 \|
	\| Claude Opus 4.1 \| 59 \|

	## Key Features

	Massive Context Windows: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

	Efficient Architecture: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

	Flexible Reasoning: aquif-3.5-Plus and Max provide interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

	Multilingual Support: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

	## Usage Recommendations

	aquif-3.5-Plus:
	- Complex reasoning requiring flexibility between speed and depth
	- Scientific analysis and mathematical problem-solving with thinking enabled
	- Rapid-response applications with thinking disabled
	- Code generation and review
	- Multilingual applications up to 1M token contexts

	aquif-3.5-Max:
	- Frontier-level problem-solving without compromise
	- Advanced research and scientific computing
	- Competition mathematics and algorithmic challenges
	- Comprehensive code analysis and generation
	- Complex multilingual tasks requiring maximum reasoning capability

	## Setting Thinking Mode (aquif-3.5-Plus)

	Toggle between thinking and non-thinking modes by modifying the chat template:

	```
	set thinking = true # Enable reasoning mode
	set thinking = false # Disable thinking mode (faster inference)
	```

	Simply set the variable in your chat template before inference to switch modes. No model reloading required.

	## Technical Specifications

	Both models support:
	- BF16 and FP16 precision
	- Mixture of Experts architecture optimizations
	- Efficient attention mechanisms with optimized KV caching
	- Up to 1M token context window
	- Multi-head attention with sparse routing

	## Acknowledgements

	- Qwen Team: Base architecture contributions
	- Meta Llama Team: Core model foundations
	- Hugging Face: Model hosting and training infrastructure

	## License

	This project is released under the Apache 2.0 License. See LICENSE file for details.

	---

	Made in 🇧🇷

	© 2025 aquif AI. All rights reserved.

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	tags:
	- language
	- aquif
	- text-generation-inference
	- reasoning
	- math
	- coding
	- frontier
	- aquif-3.5
	- moe
	language:
	- en
	- de
	- it
	- pt
	- fr
	- hi
	- es
	- th
	- zh
	- ja
	base_model:
	- Qwen/Qwen3-30B-A3B-Instruct-2507
	---

	<small>Disclaimer: aquif-3.5-Plus was made with the merging technique used in Qwen3-30B-A3B-YOYO-V3 to enable hybrid reasoning. aquif-3.5-Max, finetuned from Plus and expanded through DavidAU's brainstorm technique, went through further RL and SFT to focus more on the reasoning and coding aspects of it.</small>

	# aquif-3.5-Plus & aquif-3.5-Max

	The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities, hybrid reasoning modes and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

	aquif-3.5-Plus combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

	aquif-3.5-Max represents frontier model capabilities built on top of Plus's architecture, delivering exceptional performance across all benchmark categories.

	## Model Repository Links

	\| Model \| HuggingFace Repository \|
	\|-------\|----------------------\|
	\| aquif-3.5-Plus \| [aquif-ai/aquif-3.5-Plus](https://huggingface.co/aquif-ai/aquif-3.5-Plus-30B-A3B) \|
	\| aquif-3.5-Max \| [aquif-ai/aquif-3.5-Max](https://huggingface.co/aquif-ai/aquif-3.5-Max-42B-A3B) \|

	## Model Overview

	\| Model \| Total (B) \| Active Params (B) \| Reasoning \| Context Window \| Thinking Modes \|
	\|-------\|-----------\|-------------------\|-----------\|-----------------\|----------------\|
	\| aquif-3.5-Plus \| 30.5 \| 3.3 \| ✅ Hybrid \| 1M \| ✅ Interchangeable \|
	\| aquif-3.5-Max \| 42.4 \| 3.3 \| ✅ Reasoning-Only \| 1M \| ✅ Interchangeable \|

	## Model Details

	### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

	A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

	## Artificial Analysis Intelligence Index (AAII) Benchmarks

	### Core Performance Metrics

	\| Benchmark \| Plus (Non-Reasoning) \| Plus (Reasoning) \| Max (Non-Reasoning) \| Max (Reasoning) \|
	\| :----------------------- \| -------------------: \| ---------------: \| ------------------: \| --------------: \|
	\| MMLU-Pro \| 80.2 \| 82.8 \| 82.8 \| 85.4 \|
	\| GPQA Diamond \| 72.1 \| 79.7 \| 75.6 \| 83.2 \|
	\| AIME 2025 \| 64.7 \| 90.3 \| 69.0 \| 94.6 \|
	\| LiveCodeBench \| 50.5 \| 76.4 \| 55.9 \| 81.6 \|
	\| Humanity’s Last Exam \| 4.3 \| 12.1 \| 7.8 \| 15.6 \|
	\| TAU2-Telecom \| 34.2 \| 41.5 \| 43.2 \| 51.3 \|
	\| IFBench \| 39.3 \| 54.3 \| 49.3 \| 65.4 \|
	\| TerminalBench-Hard \| 10.1 \| 15.2 \| 18.0 \| 23.9 \|
	\| AA-LCR \| 30.4 \| 59.9 \| 31.7 \| 61.2 \|
	\| SciCode \| 29.5 \| 35.7 \| 34.7 \| 40.9 \|
	\| AAII Composite Score \| 42 (41.5) \| 55 (54.8) \| 47 (46.8) \| 60 (60.3) \|

	### Long Context Evals (RULER)
	\| Model Name \| Acc avg \| 4k \| 8k \| 16k \| 32k \| 64k \| 96k \| 128k \| 192k \| 256k \| 384k \| 512k \| 640k \| 768k \| 896k \| 1000k \|
	\| :------------------------- \| -------: \| ----: \| ----: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ----: \|
	\| aquif-3.5-Plus (Reasoning) \| 91.4 \| 99.6 \| 100.0 \| 99.2 \| 98.2 \| 97.4 \| 96.8 \| 96.8 \| 94.8 \| 89.6 \| 90.2 \| 84.0 \| 82.6 \| 81.9 \| 80.1 \| 77.5 \|
	\| aquif-3.5-Max (Reasoning) \| 92.1 \| 100.0 \| 100.0 \| 99.7 \| 98.5 \| 97.8 \| 97.1 \| 96.9 \| 95.8 \| 92.1 \| 91.1 \| 85.5 \| 84.8 \| 80.0 \| 79.9 \| 79.6 \|


	### Comparable Models by Configuration

	aquif-3.5-Plus (Non-Reasoning) — AAII 42

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| GPT-5 mini \| 42 \|
	\| Claude Haiku 4.5 \| 42 \|
	\| Gemini 2.5 Flash Lite 2509 \| 42 \|
	\| aquif-3.5-Plus (Non-Reasoning) \| 42 \|
	\| DeepSeek V3 0324 \| 41 \|
	\| Qwen3 VL 32B Instruct \| 41 \|
	\| Qwen3 Coder 480B A35B \| 42 \|

	aquif-3.5-Plus (Reasoning) — AAII 55

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| GLM-4.6 \| 56 \|
	\| Gemini 2.5 Flash 2509 \| 54 \|
	\| Claude Haiku 4.5 \| 55 \|
	\| aquif-3.5-Plus (Reasoning) \| 55 \|
	\| Qwen3 Next 80B A3B \| 54 \|

	aquif-3.5-Max (Non-Reasoning) — AAII 47

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| Gemini 2.5 Flash 2509 \| 47 \|
	\| aquif-3.5-Max (Non-Reasoning) \| 47 \|
	\| DeepSeek-V3.2 Exp \| 46 \|
	\| Ling-1T \| 45 \|
	\| GLM-4.6 \| 45 \|
	\| Qwen3 235B A22B 2507 \| 45 \|

	aquif-3.5-Max (Reasoning) — AAII 60

	\| Model \| AAII Score \|
	\|-------\|-----------\|
	\| Gemini 2.5 Pro \| 60 \|
	\| Grok 4 Fast \| 60 \|
	\| aquif-3.5-Max \| 60 \|
	\| MiniMax-M2 \| 61 \|
	\| gpt-oss-120B high \| 61 \|
	\| GPT-5 mini \| 61 \|
	\| DeepSeek-V3.1-Terminus \| 58 \|
	\| Claude Opus 4.1 \| 59 \|

	## Key Features

	Massive Context Windows: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

	Efficient Architecture: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

	Flexible Reasoning: aquif-3.5-Plus and Max provide interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

	Multilingual Support: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

	## Usage Recommendations

	aquif-3.5-Plus:
	- Complex reasoning requiring flexibility between speed and depth
	- Scientific analysis and mathematical problem-solving with thinking enabled
	- Rapid-response applications with thinking disabled
	- Code generation and review
	- Multilingual applications up to 1M token contexts

	aquif-3.5-Max:
	- Frontier-level problem-solving without compromise
	- Advanced research and scientific computing
	- Competition mathematics and algorithmic challenges
	- Comprehensive code analysis and generation
	- Complex multilingual tasks requiring maximum reasoning capability

	## Setting Thinking Mode (aquif-3.5-Plus)

	Toggle between thinking and non-thinking modes by modifying the chat template:

	```
	set thinking = true # Enable reasoning mode
	set thinking = false # Disable thinking mode (faster inference)
	```

	Simply set the variable in your chat template before inference to switch modes. No model reloading required.

	## Technical Specifications

	Both models support:
	- BF16 and FP16 precision
	- Mixture of Experts architecture optimizations
	- Efficient attention mechanisms with optimized KV caching
	- Up to 1M token context window
	- Multi-head attention with sparse routing

	## Acknowledgements

	- Qwen Team: Base architecture contributions
	- Meta Llama Team: Core model foundations
	- Hugging Face: Model hosting and training infrastructure

	## License

	This project is released under the Apache 2.0 License. See LICENSE file for details.

	---

	Made in 🇧🇷

	© 2025 aquif AI. All rights reserved.