TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference

GitHub repository

TileRT and DeepSeek-V3.2-Exp Model

TileRT is an experimental project that explores advanced compiler techniques to enable ultra-low-latency inference for large language models (LLMs). The goal is to push the latency limits of LLMs without compromising model size or quality, allowing models with hundreds of billions of parameters to run at millisecond-level latencies.

TileRT operates on pre-trained models, specifically the DeepSeek-V3-2 Exp model available on Hugging Face. We leverage the excellent work of the DeepSeek team. To facilitate usage, we have pre-processed the weights of the DeepSeek V3-2 Exp model for compatibility with the TileRT runtime. These preprocessed weights are hosted on Hugging Face, offering a streamlined experience for users without the need for manual preprocessing.

For more details on how to use TileRT, including installation instructions, usage examples, please visit the TileRT GitHub project homepage.

Disclaimer

Please note that while we provide the preprocessed weights for convenience, all rights to the original DeepSeek-V3-2 Exp model and its weights belong to the original authors. This project does not modify the underlying model architecture or its training; we only perform weight preprocessing to ensure compatibility with TileRT.

We highly encourage users to review the original DeepSeek-V3-2 Exp model's licensing and terms of use on Hugging Face for more information.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
685B params
Tensor type
BF16
·
F32
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tile-AI/DeepSeek-V3.2-Exp-TileRT

Finetuned
(13)
this model

Collection including Tile-AI/DeepSeek-V3.2-Exp-TileRT