Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
AxionLab-Co
/
AxionMoE-350k-A250k
like
1
Follow
AxionLab Co.
4
Text Generation
Transformers
Safetensors
openai/gsm8k
English
deepseek_nano
math
experiment
Mixture of Experts
deepseek
from-scratch
tiny-model
cpu
deepseek-v3-architecture
custom_code
arxiv:
2412.19437
License:
mit
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
AxionMoE-350k-A250k
1.45 MB
Ctrl+K
Ctrl+K
1 contributor
History:
21 commits
AxionLab-official
Update README.md
3f7e5c2
verified
12 days ago
.gitattributes
Safe
1.52 kB
initial commit
12 days ago
README.md
5.33 kB
Update README.md
12 days ago
config.json
1.02 kB
Update config.json
12 days ago
model.model
8.33 kB
xet
Upload 4 files
12 days ago
model.safetensors
1.39 MB
xet
Upload 4 files
12 days ago
model.vocab
40.5 kB
Upload 4 files
12 days ago
modeling_axion.py
4.88 kB
Update modeling_axion.py
12 days ago