TileRT and DeepSeek-V3.2-Exp Model
TileRT is an experimental project that explores advanced compiler techniques to enable ultra-low-latency inference for large language models (LLMs). The goal is to push the latency limits of LLMs without compromising model size or quality, allowing models with hundreds of billions of parameters to run at millisecond-level latencies.
TileRT operates on pre-trained models, specifically the DeepSeek-V3-2 Exp model available on Hugging Face. We leverage the excellent work of the DeepSeek team. To facilitate usage, we have pre-processed the weights of the DeepSeek V3-2 Exp model for compatibility with the TileRT runtime. These preprocessed weights are hosted on Hugging Face, offering a streamlined experience for users without the need for manual preprocessing.
For more details on how to use TileRT, including installation instructions, usage examples, please visit the TileRT GitHub project homepage.
Disclaimer
Please note that while we provide the preprocessed weights for convenience, all rights to the original DeepSeek-V3-2 Exp model and its weights belong to the original authors. This project does not modify the underlying model architecture or its training; we only perform weight preprocessing to ensure compatibility with TileRT.
We highly encourage users to review the original DeepSeek-V3-2 Exp model's licensing and terms of use on Hugging Face for more information.
Model tree for Tile-AI/DeepSeek-V3.2-Exp-TileRT
Base model
deepseek-ai/DeepSeek-V3.2-Exp-Base