---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: Qwen3-0.6B-SFT-AAT-Materials
tags:
- generated_from_trainer
- sft
- trl
- hf_jobs
- cultural-heritage
- aat
- materials-identification
- glam
- digital-humanities
licence: mit
---

# Model Card for Qwen3-0.6B-SFT-AAT-Materials

This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards.

It has been trained using [TRL](https://github.com/huggingface/trl) on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections.

## Model Description

This model excels at:

- **Materials Identification**: Extracting and categorizing materials from cultural heritage object descriptions
- **AAT Standardization**: Converting material descriptions to Getty Art & Architecture Thesaurus format
- **Multi-material Recognition**: Identifying compound materials (e.g., "oil on canvas" → ["Oil paint", "Canvas"])
- **Domain-specific Understanding**: Processing technical terminology from art history, archaeology, and museum cataloging

## Use Cases

### Primary Applications
- **Museum Cataloging**: Automated material extraction from object descriptions
- **Digital Collections**: Standardizing material metadata across cultural heritage databases
- **Research Tools**: Supporting art historians and archaeologists in material analysis
- **Data Migration**: Converting legacy catalog records to AAT standards

### Object Types Supported
- Paintings (oil, tempera, watercolor, acrylic)
- Sculptures (bronze, marble, wood, clay)
- Textiles (wool, linen, silk, cotton)
- Ceramics and pottery
- Metalwork and jewelry
- Glassware
- Manuscripts and prints
- Furniture and decorative objects

## Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import json

# Load the model
tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")

# Example cultural heritage object description
description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas."""

# Format the prompt
prompt = f"""Given this cultural heritage object description:

{description}

Identify the materials separate out materials as they would be found in Getty AAT"""

# Generate materials identification
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract the materials output
materials = result[len(prompt):].strip()
print(json.loads(materials))
# Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}]
```

## Expected Output Format

The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms:

```json
[
  {"oil on canvas": ["Oil paint", "Canvas"]},
  {"tempera on wood": ["tempera paint", "wood (plant material)"]},
  {"bronze": ["bronze"]}
]
```

## Training Procedure

This model was trained using Supervised Fine-Tuning (SFT) on the `small-models-for-glam/synthetic-aat-materials` dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications.

### Training Details
- **Base Model**: Qwen/Qwen3-0.6B
- **Training Method**: Supervised Fine-Tuning (SFT) with TRL
- **Dataset**: Synthetic AAT materials dataset
- **Infrastructure**: Trained using Hugging Face Jobs
- **Epochs**: 3
- **Batch Size**: 4 (with gradient accumulation)
- **Learning Rate**: 2e-5
- **Context**: Cultural heritage object descriptions → AAT materials mapping

### Dataset Characteristics
The training dataset includes diverse object types:
- Historical artifacts from various time periods
- Multiple material combinations per object
- Professional museum cataloging terminology
- AAT-compliant material classifications

### Framework versions

- TRL: 0.23.0
- Transformers: 4.56.1
- Pytorch: 2.8.0
- Datasets: 4.1.0
- Tokenizers: 0.22.0

## Citations


Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```