TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference

TileRT and DeepSeek-V3.2-Exp Model

TileRT is an experimental project that explores advanced compiler techniques to enable ultra-low-latency inference for large language models (LLMs). The goal is to push the latency limits of LLMs without compromising model size or quality, allowing models with hundreds of billions of parameters to run at millisecond-level latencies.

TileRT operates on pre-trained models, specifically the DeepSeek-V3-2 Exp model available on Hugging Face. We leverage the excellent work of the DeepSeek team. To facilitate usage, we have pre-processed the weights of the DeepSeek V3-2 Exp model for compatibility with the TileRT runtime. These preprocessed weights are hosted on Hugging Face, offering a streamlined experience for users without the need for manual preprocessing.

For more details on how to use TileRT, including installation instructions, usage examples, please visit the TileRT GitHub project homepage.

Disclaimer

Please note that while we provide the preprocessed weights for convenience, all rights to the original DeepSeek-V3-2 Exp model and its weights belong to the original authors. This project does not modify the underlying model architecture or its training; we only perform weight preprocessing to ensure compatibility with TileRT.

We highly encourage users to review the original DeepSeek-V3-2 Exp model's licensing and terms of use on Hugging Face for more information.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

685B params

Tensor type

BF16

F32

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tile-AI/DeepSeek-V3.2-Exp-TileRT

Base model

deepseek-ai/DeepSeek-V3.2-Exp-Base

Finetuned

deepseek-ai/DeepSeek-V3.2-Exp

Finetuned

(13)

this model

Collection including Tile-AI/DeepSeek-V3.2-Exp-TileRT

TileRT

Collection

Tile-Based Runtime for Ultra-Low Latency LLM Inference • 1 item • Updated 3 days ago

TileRT: Tile-Based Runtime forUltra-Low-Latency LLM Inference

TileRT and DeepSeek-V3.2-Exp Model

Disclaimer

Model tree for Tile-AI/DeepSeek-V3.2-Exp-TileRT

Collection including Tile-AI/DeepSeek-V3.2-Exp-TileRT

TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference