---
license: apache-2.0
tags:
- t5gemma
---

# Timestep-Aware Semantic Alignment for Gemma Encoder

- uses bidirectional attention mask for padded tokens
- text-image pairs specifically targeting low/high-frequency details based on the diffusion timestep
- explicit timestep conditioning integrated into the encoder
- as in previous releases, the dataset was cleaned of overused phrases and stop words 

```python
prompt_embeds = encoder.encode(text, t=torch.ones((1,)))
callback_on_step_end = lambda p, i, t, kwargs: encoder.model.forward(*encoder, t=torch.ones((1,)) * t.to('cpu') / 1000.0)
```

## References

- 2403.05135

## Datasets

- animelover/touhou-images
- animesfw
- alfredplpl/artbench-pd-256x256
- anime-art-multicaptions (multicharacter interactions)
- danbooru2023-florence2-caption (verb, action clauses)
- spatial-caption
- spright-coco
- colormix (synthetic color, fashion dataset)
- picollect
- pixiv rank
- trojblue/danbooru2025-metadata