Model Card for Qwen3-0.6B-SFT-AAT-Materials
This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards.
It has been trained using TRL on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections.
Model Description
This model excels at:
- Materials Identification: Extracting and categorizing materials from cultural heritage object descriptions
- AAT Standardization: Converting material descriptions to Getty Art & Architecture Thesaurus format
- Multi-material Recognition: Identifying compound materials (e.g., "oil on canvas" โ ["Oil paint", "Canvas"])
- Domain-specific Understanding: Processing technical terminology from art history, archaeology, and museum cataloging
Use Cases
Primary Applications
- Museum Cataloging: Automated material extraction from object descriptions
- Digital Collections: Standardizing material metadata across cultural heritage databases
- Research Tools: Supporting art historians and archaeologists in material analysis
- Data Migration: Converting legacy catalog records to AAT standards
Object Types Supported
- Paintings (oil, tempera, watercolor, acrylic)
- Sculptures (bronze, marble, wood, clay)
- Textiles (wool, linen, silk, cotton)
- Ceramics and pottery
- Metalwork and jewelry
- Glassware
- Manuscripts and prints
- Furniture and decorative objects
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import json
# Load the model
tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
# Example cultural heritage object description
description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas."""
# Format the prompt
prompt = f"""Given this cultural heritage object description:
{description}
Identify the materials separate out materials as they would be found in Getty AAT"""
# Generate materials identification
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract the materials output
materials = result[len(prompt):].strip()
print(json.loads(materials))
# Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}]
Expected Output Format
The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms:
[
{"oil on canvas": ["Oil paint", "Canvas"]},
{"tempera on wood": ["tempera paint", "wood (plant material)"]},
{"bronze": ["bronze"]}
]
Training Procedure
This model was trained using Supervised Fine-Tuning (SFT) on the small-models-for-glam/synthetic-aat-materials dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications.
Training Details
- Base Model: Qwen/Qwen3-0.6B
- Training Method: Supervised Fine-Tuning (SFT) with TRL
- Dataset: Synthetic AAT materials dataset
- Infrastructure: Trained using Hugging Face Jobs
- Epochs: 3
- Batch Size: 4 (with gradient accumulation)
- Learning Rate: 2e-5
- Context: Cultural heritage object descriptions โ AAT materials mapping
Dataset Characteristics
The training dataset includes diverse object types:
- Historical artifacts from various time periods
- Multiple material combinations per object
- Professional museum cataloging terminology
- AAT-compliant material classifications
Framework versions
- TRL: 0.23.0
- Transformers: 4.56.1
- Pytorch: 2.8.0
- Datasets: 4.1.0
- Tokenizers: 0.22.0
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}