Elastic model: Fastest self-serving models. Stable Diffusion 3.5 Large.

Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:

XL: Mathematically equivalent neural network, optimized with our DNN compiler.
S: The fastest model, with accuracy degradation less than 2%.

Goals of Elastic Models:

Provide the fastest models and service for self-hosting.
Provide flexibility in cost vs quality selection for inference.
Provide clear quality and latency benchmarks.
Provide interface of HF libraries: transformers and diffusers with a single line of code.
Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.

It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.

Inference

Currently, our demo model supports 512x512 - 1024x1024 and batch sizes 1-4. This will be updated in the near future. To infer our models, you just need to replace diffusers import with elastic_models.diffusers:

import torch
from elastic_models.diffusers import StableDiffusion3Pipeline

model_name = 'stabilityai/stable-diffusion-3.5-large'
hf_token = ''
device = torch.device("cuda")

pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    token=hf_token,
    mode='S'
)
pipeline.to(device)

prompts = ["A cat holding a sign that says hello world"]
output = pipeline(prompt=prompts)

for prompt, output_image in zip(prompts, output.images):
    output_image.save((prompt.replace(' ', '_') + '.png'))

Installation

System requirements:

GPUs: H100, B200
CPU: AMD, Intel
Python: 3.10-3.12

To work with our models just run these lines in your terminal:

pip install thestage
pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple

# or for blackwell support
pip install 'thestage-elastic-models[blackwell]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -U --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cu128


pip install flash_attn==2.7.3 --no-build-isolation
pip uninstall apex

Then go to app.thestage.ai, login and generate API token from your profile page. Set up API token as follows:

thestage config set --api-token <YOUR_API_TOKEN>

Congrats, now you can use accelerated models!

Benchmarks

Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.

Quality benchmarks

For quality evaluation we have used: PSNR and SSIM. PSNR and SSIM were computed using outputs of original model.

Metric/Model	S	XL	Original
PSNR	20.78	29.13	inf
SSIM	0.81	0.95	1.0

Latency benchmarks

Time in seconds to generate one image 1024x1024

GPU/Model	S	XL	Original
H100	3.10	3.80	6.55
B200	1.76	2.27	4.81

Model tree for TheStageAI/Elastic-stable-diffusion-3.5-large

Base model

stabilityai/stable-diffusion-3.5-large

Quantized

(11)

this model

TheStageAI
/

Elastic-stable-diffusion-3.5-large