--- license: apache-2.0 tags: - t5gemma --- # Timestep-Aware Semantic Alignment for Gemma Encoder - uses bidirectional attention mask for padded tokens - text-image pairs specifically targeting low/high-frequency details based on the diffusion timestep - explicit timestep conditioning integrated into the encoder - as in previous releases, the dataset was cleaned of overused phrases and stop words ```python prompt_embeds = encoder.encode(text, t=torch.ones((1,))) callback_on_step_end = lambda p, i, t, kwargs: encoder.model.forward(*encoder, t=torch.ones((1,)) * t.to('cpu') / 1000.0) ``` ## References - 2403.05135 ## Datasets - animelover/touhou-images - animesfw - alfredplpl/artbench-pd-256x256 - anime-art-multicaptions (multicharacter interactions) - danbooru2023-florence2-caption (verb, action clauses) - spatial-caption - spright-coco - colormix (synthetic color, fashion dataset) - picollect - pixiv rank - trojblue/danbooru2025-metadata