Text Generation
Safetensors
English
llava_phi
conversational
custom_code

πŸŽ‰ CompeteSMoE-5.1B

CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results.

πŸ“ Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. 🚧 We're actively working on a stronger, more robust release β€” coming soon! πŸš€ Stay tuned for updates. πŸ’‘

Hardware Resources

Stage MoE Method Hardware
Pre-Training 4xH100
Pre-FineTuning 4xH100
VIT CompeteSMoE 4xH100

Citation Information

More details can be found in our paper.

If you use CompeteSMoE, please cite it using this BibTeX:

@misc{nguyen2025competesmoe,
    title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition},
    author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
    year={2025},
    eprint={2505.13380},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}
Downloads last month
5
Safetensors
Model size
5B params
Tensor type
BF16
Β·
BOOL
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Fsoft-AIC/CompeteSMoE-5.1B

Finetuned
(105)
this model

Dataset used to train Fsoft-AIC/CompeteSMoE-5.1B

Collection including Fsoft-AIC/CompeteSMoE-5.1B