wjbmattingly commited on
Commit
1c60626
·
verified ·
1 Parent(s): c8fc822

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -17
README.md CHANGED
@@ -1,44 +1,123 @@
1
  ---
2
  base_model: Qwen/Qwen3-0.6B
3
  library_name: transformers
4
- model_name: Qwen3-0.6B-SFT-AAT-Materials3
5
  tags:
6
  - generated_from_trainer
7
- - hf_jobs
8
- - trl
9
  - sft
10
- licence: license
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Model Card for Qwen3-0.6B-SFT-AAT-Materials3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
 
 
 
 
 
17
 
18
- ## Quick start
19
 
20
  ```python
21
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials3", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
 
 
27
  ```
28
 
29
- ## Training procedure
30
 
31
-
32
 
 
 
 
 
 
 
 
 
 
33
 
34
- This model was trained with SFT.
 
 
 
 
 
35
 
36
  ### Framework versions
37
 
38
  - TRL: 0.23.0
39
  - Transformers: 4.56.1
40
  - Pytorch: 2.8.0
41
- - Datasets: 4.1.1
42
  - Tokenizers: 0.22.0
43
 
44
  ## Citations
 
1
  ---
2
  base_model: Qwen/Qwen3-0.6B
3
  library_name: transformers
4
+ model_name: Qwen3-0.6B-SFT-AAT-Materials
5
  tags:
6
  - generated_from_trainer
 
 
7
  - sft
8
+ - trl
9
+ - hf_jobs
10
+ - cultural-heritage
11
+ - aat
12
+ - materials-identification
13
+ - glam
14
+ - digital-humanities
15
+ licence: mit
16
  ---
17
 
18
+ # Model Card for Qwen3-0.6B-SFT-AAT-Materials
19
+
20
+ This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for identifying materials in cultural heritage object descriptions according to Getty Art & Architecture Thesaurus (AAT) standards.
21
+
22
+ It has been trained using [TRL](https://github.com/huggingface/trl) on synthetic data representing diverse cultural heritage objects from museums, galleries, libraries, archives, and museums (GLAM) collections.
23
+
24
+ ## Model Description
25
+
26
+ This model excels at:
27
+
28
+ - **Materials Identification**: Extracting and categorizing materials from cultural heritage object descriptions
29
+ - **AAT Standardization**: Converting material descriptions to Getty Art & Architecture Thesaurus format
30
+ - **Multi-material Recognition**: Identifying compound materials (e.g., "oil on canvas" → ["Oil paint", "Canvas"])
31
+ - **Domain-specific Understanding**: Processing technical terminology from art history, archaeology, and museum cataloging
32
+
33
+ ## Use Cases
34
+
35
+ ### Primary Applications
36
+ - **Museum Cataloging**: Automated material extraction from object descriptions
37
+ - **Digital Collections**: Standardizing material metadata across cultural heritage databases
38
+ - **Research Tools**: Supporting art historians and archaeologists in material analysis
39
+ - **Data Migration**: Converting legacy catalog records to AAT standards
40
 
41
+ ### Object Types Supported
42
+ - Paintings (oil, tempera, watercolor, acrylic)
43
+ - Sculptures (bronze, marble, wood, clay)
44
+ - Textiles (wool, linen, silk, cotton)
45
+ - Ceramics and pottery
46
+ - Metalwork and jewelry
47
+ - Glassware
48
+ - Manuscripts and prints
49
+ - Furniture and decorative objects
50
 
51
+ ## Quick Start
52
 
53
  ```python
54
+ from transformers import AutoTokenizer, AutoModelForCausalLM
55
+ import json
56
+
57
+ # Load the model
58
+ tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
59
+ model = AutoModelForCausalLM.from_pretrained("small-models-for-glam/Qwen3-0.6B-SFT-AAT-Materials")
60
+
61
+ # Example cultural heritage object description
62
+ description = """A bronze sculpture from 1425, standing 150 cm tall. The figure is mounted on a marble base and features intricate details cast in the bronze medium. The sculpture shows traces of original gilding on selected areas."""
63
+
64
+ # Format the prompt
65
+ prompt = f"""Given this cultural heritage object description:
66
+
67
+ {description}
68
+
69
+ Identify the materials separate out materials as they would be found in Getty AAT"""
70
+
71
+ # Generate materials identification
72
+ inputs = tokenizer(prompt, return_tensors="pt")
73
+ outputs = model.generate(inputs.input_ids, max_length=512, temperature=0.3)
74
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
75
+
76
+ # Extract the materials output
77
+ materials = result[len(prompt):].strip()
78
+ print(json.loads(materials))
79
+ # Expected output: [{"Bronze": ["bronze"]}, {"Marble": ["marble"]}, {"Gold leaf": ["gold", "leaf"]}]
80
+ ```
81
+
82
+ ## Expected Output Format
83
+
84
+ The model outputs materials in JSON format where each material combination is mapped to its constituent AAT terms:
85
 
86
+ ```json
87
+ [
88
+ {"oil on canvas": ["Oil paint", "Canvas"]},
89
+ {"tempera on wood": ["tempera paint", "wood (plant material)"]},
90
+ {"bronze": ["bronze"]}
91
+ ]
92
  ```
93
 
94
+ ## Training Procedure
95
 
96
+ This model was trained using Supervised Fine-Tuning (SFT) on the `small-models-for-glam/synthetic-aat-materials` dataset, which contains thousands of synthetic cultural heritage object descriptions paired with their corresponding AAT material classifications.
97
 
98
+ ### Training Details
99
+ - **Base Model**: Qwen/Qwen3-0.6B
100
+ - **Training Method**: Supervised Fine-Tuning (SFT) with TRL
101
+ - **Dataset**: Synthetic AAT materials dataset
102
+ - **Infrastructure**: Trained using Hugging Face Jobs
103
+ - **Epochs**: 3
104
+ - **Batch Size**: 4 (with gradient accumulation)
105
+ - **Learning Rate**: 2e-5
106
+ - **Context**: Cultural heritage object descriptions → AAT materials mapping
107
 
108
+ ### Dataset Characteristics
109
+ The training dataset includes diverse object types:
110
+ - Historical artifacts from various time periods
111
+ - Multiple material combinations per object
112
+ - Professional museum cataloging terminology
113
+ - AAT-compliant material classifications
114
 
115
  ### Framework versions
116
 
117
  - TRL: 0.23.0
118
  - Transformers: 4.56.1
119
  - Pytorch: 2.8.0
120
+ - Datasets: 4.1.0
121
  - Tokenizers: 0.22.0
122
 
123
  ## Citations