An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper
•
2010.11929
•
Published
•
15
Implementation of DeiT proposed in Training data-efficient image transformers & distillation through attention
An attention based distillation is proposed where a new token is added to the model, the [dist]{.title-ref} token.
DeiT.deit_tiny_patch16_224()
DeiT.deit_small_patch16_224()
DeiT.deit_base_patch16_224()
DeiT.deit_base_patch16_384()