--- base_model: Qwen/Qwen3-0.6B library_name: transformers model_name: Qwen3-0.6B-SFT-AAT-Materials tags: - generated_from_trainer - sft - trl - hf_jobs - cultural-heritage - aat - materials-identification - glam - digital-humanities licence: mit --- # Model Card for Qwen3-0.6B-SFT-AAT-Materials This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards. It has been trained using [TRL](https://github.com/huggingface/trl) on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections. ## Model Description This model excels at: - **Materials Identification**: Extracting and categorizing materials from cultural heritage object descriptions - **AAT Standardization**: Converting material descriptions to Getty Art & Architecture Thesaurus format - **Multi-material Recognition**: Identifying compound materials (e.g., "oil on canvas" → ["Oil paint", "Canvas"]) - **Domain-specific Understanding**: Processing technical terminology from art history, archaeology, and museum cataloging ## Use Cases ### Primary Applications - **Museum Cataloging**: Automated material extraction from object descriptions - **Digital Collections**: Standardizing material metadata across cultural heritage databases - **Research Tools**: Supporting art historians and archaeologists in material analysis - **Data Migration**: Converting legacy catalog records to AAT standards ### Object Types Supported - Paintings (oil, tempera, watercolor, acrylic) - Sculptures (bronze, marble, wood, clay) - Textiles (wool, linen, silk, cotton) - Ceramics and pottery - Metalwork and jewelry - Glassware - Manuscripts and prints - Furniture and decorative objects ## Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM import json # Load the model tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials") model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials") # Example cultural heritage object description description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas.""" # Format the prompt prompt = f"""Given this cultural heritage object description: {description} Identify the materials separate out materials as they would be found in Getty AAT""" # Generate materials identification inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3) result = tokenizer.decode(outputs[0], skip_special_tokens=True) # Extract the materials output materials = result[len(prompt):].strip() print(json.loads(materials)) # Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}] ``` ## Expected Output Format The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms: ```json [ {"oil on canvas": ["Oil paint", "Canvas"]}, {"tempera on wood": ["tempera paint", "wood (plant material)"]}, {"bronze": ["bronze"]} ] ``` ## Training Procedure This model was trained using Supervised Fine-Tuning (SFT) on the `small-models-for-glam/synthetic-aat-materials` dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications. ### Training Details - **Base Model**: Qwen/Qwen3-0.6B - **Training Method**: Supervised Fine-Tuning (SFT) with TRL - **Dataset**: Synthetic AAT materials dataset - **Infrastructure**: Trained using Hugging Face Jobs - **Epochs**: 3 - **Batch Size**: 4 (with gradient accumulation) - **Learning Rate**: 2e-5 - **Context**: Cultural heritage object descriptions → AAT materials mapping ### Dataset Characteristics The training dataset includes diverse object types: - Historical artifacts from various time periods - Multiple material combinations per object - Professional museum cataloging terminology - AAT-compliant material classifications ### Framework versions - TRL: 0.23.0 - Transformers: 4.56.1 - Pytorch: 2.8.0 - Datasets: 4.1.0 - Tokenizers: 0.22.0 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```