---
tags:
- image-classification
- timm
- transformers
- animetimm
- dghs-imgutils
library_name: timm
license: gpl-3.0
datasets:
- animetimm/danbooru-wdtagger-v4-w640-ws-full
base_model:
- timm/resnet34.a1_in1k
---

# Anime Tagger resnet34.dbv4-full

## Model Details

- **Model Type:** Multilabel Image classification / feature backbone
- **Model Stats:**
  - Params: 27.7M
  - FLOPs / MACs: 21.6G / 10.8G
  - Image size: train = 384 x 384, test = 384 x 384
- **Dataset:** [animetimm/danbooru-wdtagger-v4-w640-ws-full](https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full)
  - Tags Count: 12476
    - General (#0) Tags Count: 9225
    - Character (#4) Tags Count: 3247
    - Rating (#9) Tags Count: 4

## Results

|     #      |    Macro@0.40 (F1/MCC/P/R)    |    Micro@0.40 (F1/MCC/P/R)    |  Macro@Best (F1/P/R)  |
|:----------:|:-----------------------------:|:-----------------------------:|:---------------------:|
| Validation | 0.289 / 0.304 / 0.402 / 0.256 | 0.530 / 0.533 / 0.612 / 0.467 |          ---          |
|    Test    | 0.290 / 0.305 / 0.404 / 0.257 | 0.530 / 0.534 / 0.614 / 0.466 | 0.342 / 0.386 / 0.337 |

* `Macro/Micro@0.40` means the metrics on the threshold 0.40.
* `Macro@Best` means the mean metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

## Thresholds

|  Category  |   Name    |  Alpha  |  Threshold  |  Micro@Thr (F1/P/R)   |  Macro@0.40 (F1/P/R)  |  Macro@Best (F1/P/R)  |
|:----------:|:---------:|:-------:|:-----------:|:---------------------:|:---------------------:|:---------------------:|
|     0      |  general  |    1    |    0.32     | 0.522 / 0.544 / 0.503 | 0.175 / 0.303 / 0.146 | 0.233 / 0.255 / 0.250 |
|     4      | character |    1    |    0.48     | 0.674 / 0.781 / 0.594 | 0.616 / 0.692 / 0.572 | 0.650 / 0.758 / 0.584 |
|     9      |  rating   |    1    |    0.37     | 0.760 / 0.692 / 0.843 | 0.756 / 0.719 / 0.801 | 0.759 / 0.716 / 0.811 |

* `Micro@Thr` means the metrics on the category-level suggested thresholds, which are listed in the table above.
* `Macro@0.40` means the metrics on the threshold 0.40.
* `Macro@Best` means the metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

For tag-level thresholds, you can find them in [selected_tags.csv](https://huggingface.co/animetimm/resnet34.dbv4-full/resolve/main/selected_tags.csv).

## How to Use

We provided a sample image for our code samples, you can find it [here](https://huggingface.co/animetimm/resnet34.dbv4-full/blob/main/sample.webp).

### Use TIMM And Torch

Install [dghs-imgutils](https://github.com/deepghs/imgutils), [timm](https://github.com/huggingface/pytorch-image-models) and other necessary requirements with the following command

```shell
pip install 'dghs-imgutils>=0.17.0' torch huggingface_hub timm pillow pandas
```

After that you can load this model with timm library, and use it for train, validation and test, with the following code

```python
import json

import pandas as pd
import torch
from huggingface_hub import hf_hub_download
from imgutils.data import load_image
from imgutils.preprocess import create_torchvision_transforms
from timm import create_model

repo_id = 'animetimm/resnet34.dbv4-full'
model = create_model(f'hf-hub:{repo_id}', pretrained=True)
model.eval()

with open(hf_hub_download(repo_id=repo_id, repo_type='model', filename='preprocess.json'), 'r') as f:
    preprocessor = create_torchvision_transforms(json.load(f)['test'])
# Compose(
#     PadToSize(size=(512, 512), interpolation=bilinear, background_color=white)
#     Resize(size=384, interpolation=bicubic, max_size=None, antialias=True)
#     CenterCrop(size=[384, 384])
#     MaybeToTensor()
#     Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
# )

image = load_image('https://huggingface.co/animetimm/resnet34.dbv4-full/resolve/main/sample.webp')
input_ = preprocessor(image).unsqueeze(0)
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
with torch.no_grad():
    output = model(input_)
    prediction = torch.sigmoid(output)[0]
# output, shape: torch.Size([1, 12476]), dtype: torch.float32
# prediction, shape: torch.Size([12476]), dtype: torch.float32

df_tags = pd.read_csv(
    hf_hub_download(repo_id=repo_id, repo_type='model', filename='selected_tags.csv'),
    keep_default_na=False
)
tags = df_tags['name']
mask = prediction.numpy() >= df_tags['best_threshold']
print(dict(zip(tags[mask].tolist(), prediction[mask].tolist())))
# {'general': 0.6698559522628784,
#  '1girl': 0.9919251799583435,
#  'solo': 0.9640495777130127,
#  'looking_at_viewer': 0.8042827248573303,
#  'blush': 0.7859585285186768,
#  'smile': 0.931821346282959,
#  'short_hair': 0.6032187342643738,
#  'shirt': 0.5651751756668091,
#  'long_sleeves': 0.7825257778167725,
#  'brown_hair': 0.765760600566864,
#  'holding': 0.36651739478111267,
#  'dress': 0.46657896041870117,
#  'closed_mouth': 0.6926622986793518,
#  'purple_eyes': 0.8659283518791199,
#  'upper_body': 0.4256909489631653,
#  'flower': 0.9400202035903931,
#  'braid': 0.48209348320961,
#  'outdoors': 0.5174916386604309,
#  'medium_hair': 0.20816299319267273,
#  'hand_up': 0.15446977317333221,
#  'blunt_bangs': 0.37330013513565063,
#  'necklace': 0.3393518328666687,
#  'sweater': 0.26656919717788696,
#  'head_tilt': 0.1363854557275772,
#  'rose': 0.596889078617096,
#  'light_smile': 0.12500429153442383,
#  'blue_flower': 0.37346720695495605,
#  'backlighting': 0.18008625507354736,
#  'purple_flower': 0.20002909004688263,
#  'bouquet': 0.6832711696624756,
#  'holding_bouquet': 0.3936881422996521}
```
### Use ONNX Model For Inference

Install [dghs-imgutils](https://github.com/deepghs/imgutils) with the following command

```shell
pip install 'dghs-imgutils>=0.17.0'
```

Use `multilabel_timm_predict` function with the following code

```python
from imgutils.generic import multilabel_timm_predict

general, character, rating = multilabel_timm_predict(
    'https://huggingface.co/animetimm/resnet34.dbv4-full/resolve/main/sample.webp',
    repo_id='animetimm/resnet34.dbv4-full',
    fmt=('general', 'character', 'rating'),
)

print(general)
# {'1girl': 0.9919252395629883,
#  'solo': 0.9640498161315918,
#  'flower': 0.9400202035903931,
#  'smile': 0.9318214654922485,
#  'purple_eyes': 0.8659287691116333,
#  'looking_at_viewer': 0.8042830228805542,
#  'blush': 0.7859582901000977,
#  'long_sleeves': 0.7825255393981934,
#  'brown_hair': 0.7657610177993774,
#  'closed_mouth': 0.6926621794700623,
#  'bouquet': 0.683271050453186,
#  'short_hair': 0.6032184362411499,
#  'rose': 0.5968889594078064,
#  'shirt': 0.5651752352714539,
#  'outdoors': 0.5174940824508667,
#  'braid': 0.4820920526981354,
#  'dress': 0.4665789306163788,
#  'upper_body': 0.4256901443004608,
#  'holding_bouquet': 0.39368829131126404,
#  'blue_flower': 0.37346646189689636,
#  'blunt_bangs': 0.37329983711242676,
#  'holding': 0.3665176033973694,
#  'necklace': 0.3393515944480896,
#  'sweater': 0.2665693461894989,
#  'medium_hair': 0.20816296339035034,
#  'purple_flower': 0.20002898573875427,
#  'backlighting': 0.1800859570503235,
#  'hand_up': 0.1544697880744934,
#  'head_tilt': 0.13638553023338318,
#  'light_smile': 0.12500450015068054}
print(character)
# {}
print(rating)
# {'general': 0.6698562502861023}
```

For further information, see [documentation of function multilabel_timm_predict](https://dghs-imgutils.deepghs.org/main/api_doc/generic/multilabel_timm.html#multilabel-timm-predict).