Model Card for Qwen3-0.6B-SFT-AAT-Materials

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards.

It has been trained using TRL on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections.

Model Description

This model excels at:

Materials Identification: Extracting and categorizing materials from cultural heritage object descriptions
AAT Standardization: Converting material descriptions to Getty Art & Architecture Thesaurus format
Multi-material Recognition: Identifying compound materials (e.g., "oil on canvas" → ["Oil paint", "Canvas"])
Domain-specific Understanding: Processing technical terminology from art history, archaeology, and museum cataloging

Use Cases

Primary Applications

Museum Cataloging: Automated material extraction from object descriptions
Digital Collections: Standardizing material metadata across cultural heritage databases
Research Tools: Supporting art historians and archaeologists in material analysis
Data Migration: Converting legacy catalog records to AAT standards

Object Types Supported

Paintings (oil, tempera, watercolor, acrylic)
Sculptures (bronze, marble, wood, clay)
Textiles (wool, linen, silk, cotton)
Ceramics and pottery
Metalwork and jewelry
Glassware
Manuscripts and prints
Furniture and decorative objects

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import json

# Load the model
tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")

# Example cultural heritage object description
description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas."""

# Format the prompt
prompt = f"""Given this cultural heritage object description:

{description}

Identify the materials separate out materials as they would be found in Getty AAT"""

# Generate materials identification
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract the materials output
materials = result[len(prompt):].strip()
print(json.loads(materials))
# Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}]

Expected Output Format

The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms:

[
  {"oil on canvas": ["Oil paint", "Canvas"]},
  {"tempera on wood": ["tempera paint", "wood (plant material)"]},
  {"bronze": ["bronze"]}
]

Training Procedure

This model was trained using Supervised Fine-Tuning (SFT) on the small-models-for-glam/synthetic-aat-materials dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications.

Training Details

Base Model: Qwen/Qwen3-0.6B
Training Method: Supervised Fine-Tuning (SFT) with TRL
Dataset: Synthetic AAT materials dataset
Infrastructure: Trained using Hugging Face Jobs
Epochs: 3
Batch Size: 4 (with gradient accumulation)
Learning Rate: 2e-5
Context: Cultural heritage object descriptions → AAT materials mapping

Dataset Characteristics

The training dataset includes diverse object types:

Historical artifacts from various time periods
Multiple material combinations per object
Professional museum cataloging terminology
AAT-compliant material classifications

Framework versions

TRL: 0.23.0
Transformers: 4.56.1
Pytorch: 2.8.0
Datasets: 4.1.0
Tokenizers: 0.22.0

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(328)

this model