25
FIBO
β¨
AoT compilation, ZeroGPU inference optimization
torch.compiletorchaoInt8WeightOnlyConfig is already working flawlessly in our tests.import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_
pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)
@spaces.GPU
def generate(prompt: str):
return pipeline(prompt).images[0]medium size is now available as a power-user featurelarge (70GB VRAM)βbut this paves the way for:medium will offer significantly more usage than large)xlarge size (141GB VRAM)auto (future default)mediumlarge (current default)largemedium
2 ** search_round) and repeat 1 - 3.