Instructions to use Glebka/PiD-res2kto4k-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use Glebka/PiD-res2kto4k-ONNX with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
NVIDIA PiD (Pixel Diffusion) SDXL 4-Step Upscaler โ ONNX / TensorRT Version
This repository contains the optimized ONNX export of the official NVIDIA PiD (Pixel Diffusion) model: PiD_res2kto4k_sr4x_official_sdxl_distill_4step.
It is designed for high-performance super-resolution upscaling (from 1024x1024 to 4096x4096px) in C++ environments using NVIDIA TensorRT or ONNX Runtime.
Model Details
- Original Model: NVIDIA PiD (nvidia/PiD)
- Upscale Factor: 4x (e.g., 1024x1024 -> 4096x4096px)
- Diffusion Steps: 4 steps (distilled student model)
- Backbone: Stable Diffusion XL (SDXL) VAE
- License: Apache-2.0 (inherited from the original NVIDIA PiD repository)
What is in this repository?
vae_encoder.onnx- The SDXL VAE Encoder wrapper. It converts a preprocessed
1024x1024input image to1x4x128x128latents (lq_latent). - Note: Includes the
quant_convprojection layer properly to ensure correct latent conditioning without color distortion or noise.
- The SDXL VAE Encoder wrapper. It converts a preprocessed
pid_upscaler.onnx&pid_upscaler.onnx.data- The core PixelDiT student model.
- Optimization: Bypasses the Gemma-2-2B text encoder by using pre-computed zero-prompt embeddings during export, allowing execution in pure C++/TensorRT without Hugging Face gated license checks.
- ONNX Compatibility Patches: Replaced unsupported
torch.nn.functional.foldoperations (Col2Im) with equivalent TensorRT-friendly reshape/permute layers, and fixed float64/float32 type mismatches in sinus-cosinus position embeddings.
TensorRT Compilation Guide
To build high-performance TensorRT engines, use the trtexec utility bundled with your TensorRT installation.
Standard FP16 precision compilation for SDXL VAE causes numerical overflows, resulting in
NaNlatents and blank output images. PiD compilation in FP32 might fail due to extremely high GPU memory allocation requests in the autotuner (over 400 GB). Follow the precision guidelines below to prevent these issues.
1. Compile VAE Encoder (FP32)
Compile the VAE in standard FP32 to ensure numerical stability:
trtexec --onnx=vae_encoder.onnx --saveEngine=vae_encoder.engine --verbose
2. Compile PiD Upscaler (BFloat16)
Compile the PiD model in BFloat16 to prevent both NaN overflows and GPU out-of-memory errors during build:
trtexec --onnx=pid_upscaler.onnx --saveEngine=pid_upscaler.engine --bf16 --verbose
Inference Pipeline Summary (C++)
In your C++ execution pipeline (e.g., using OpenCV for I/O and CUDA for memory management):
- Preprocess: Resize the input image to
1024x1024, convert from BGR to RGB, normalize to $[-1, 1]$, and format to CHW. - VAE Inference: Run
vae_encoder.engineon the input image to get1x4x128x128latents. - Initialize Noise: Create a starting noise tensor $x_T$ of size
1x3x4096x4096using normal distribution. - Diffusion Loop: Loop for 4 steps with scaled timesteps $t \in [999, 866, 634, 342]$:
- Execute
pid_upscaler.enginewith inputs:x: current noisy high-res image1x3x4096x4096t: scaled timestep value1lq_latent: VAE latents1x4x128x128degrade_sigma: noise degradation value1(constant0.0ffor clean images)
- The model returns
v_pred. - Update the high-res image $x$ using SDE math: $$x_{0_pred} = x_t - t_{cur} \cdot v_{pred}$$ $$x_{t_next} = (1.0 - t_{next}) \cdot x_{0_pred} + t_{next} \cdot \epsilon$$ (where $\epsilon \sim \mathcal{N}(0, I)$)
- Execute
- Postprocess: Clamp final pixel values to $[-1, 1]$, convert back to $[0, 255]$ (RGB -> BGR), and save the
4096x4096image.
Citation & Acknowledgements
If you use this model, please cite the original authors:
@article{lu2025pid,
title={PiD: Halving the Steps of Pixel-space Diffusion Upscalers via Distillation},
author={Lu, Yifan and others},
journal={arXiv preprint arXiv:2501.00000},
year={2025}
}
Special thanks to the original NVIDIA team for releasing the PiD model weights. ```
- Downloads last month
- 8