Model Card for Qwen3-0.6B-SFT-AAT-Materials

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards.

It has been trained using TRL on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections.

Model Description

This model excels at:

  • Materials Identification: Extracting and categorizing materials from cultural heritage object descriptions
  • AAT Standardization: Converting material descriptions to Getty Art & Architecture Thesaurus format
  • Multi-material Recognition: Identifying compound materials (e.g., "oil on canvas" โ†’ ["Oil paint", "Canvas"])
  • Domain-specific Understanding: Processing technical terminology from art history, archaeology, and museum cataloging

Use Cases

Primary Applications

  • Museum Cataloging: Automated material extraction from object descriptions
  • Digital Collections: Standardizing material metadata across cultural heritage databases
  • Research Tools: Supporting art historians and archaeologists in material analysis
  • Data Migration: Converting legacy catalog records to AAT standards

Object Types Supported

  • Paintings (oil, tempera, watercolor, acrylic)
  • Sculptures (bronze, marble, wood, clay)
  • Textiles (wool, linen, silk, cotton)
  • Ceramics and pottery
  • Metalwork and jewelry
  • Glassware
  • Manuscripts and prints
  • Furniture and decorative objects

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import json

# Load the model
tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")

# Example cultural heritage object description
description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas."""

# Format the prompt
prompt = f"""Given this cultural heritage object description:

{description}

Identify the materials separate out materials as they would be found in Getty AAT"""

# Generate materials identification
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract the materials output
materials = result[len(prompt):].strip()
print(json.loads(materials))
# Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}]

Expected Output Format

The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms:

[
  {"oil on canvas": ["Oil paint", "Canvas"]},
  {"tempera on wood": ["tempera paint", "wood (plant material)"]},
  {"bronze": ["bronze"]}
]

Training Procedure

This model was trained using Supervised Fine-Tuning (SFT) on the small-models-for-glam/synthetic-aat-materials dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications.

Training Details

  • Base Model: Qwen/Qwen3-0.6B
  • Training Method: Supervised Fine-Tuning (SFT) with TRL
  • Dataset: Synthetic AAT materials dataset
  • Infrastructure: Trained using Hugging Face Jobs
  • Epochs: 3
  • Batch Size: 4 (with gradient accumulation)
  • Learning Rate: 2e-5
  • Context: Cultural heritage object descriptions โ†’ AAT materials mapping

Dataset Characteristics

The training dataset includes diverse object types:

  • Historical artifacts from various time periods
  • Multiple material combinations per object
  • Professional museum cataloging terminology
  • AAT-compliant material classifications

Framework versions

  • TRL: 0.23.0
  • Transformers: 4.56.1
  • Pytorch: 2.8.0
  • Datasets: 4.1.0
  • Tokenizers: 0.22.0

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(328)
this model