hanszhu commited on
Commit
435bbc0
·
verified ·
1 Parent(s): f3e981a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -3
README.md CHANGED
@@ -1,3 +1,168 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - openmmlab/cascade-rcnn
7
+ pipeline_tag: object-detection
8
+ ---
9
+
10
+ # Model Card for ChartElementNet-MultiClass
11
+
12
+ ChartElementNet-MultiClass is a deep learning model for multi-class scientific chart element detection. It detects axes, legends, labels, titles, tick marks, data lines, bars, and more in scientific figures. The model is powered by Cascade R-CNN with a Swin Transformer backbone and is trained on enhanced COCO-style datasets with rich chart element annotations.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ ChartElementNet-MultiClass automates the detection and localization of a wide range of chart elements in scientific figures. It leverages a Cascade R-CNN architecture with a Swin Transformer backbone for robust multi-class detection, especially for small and densely packed elements. The model is intended for use in document image understanding, chart parsing, and scientific figure mining.
19
+
20
+ - **Developed by:** Hansheng Zhu
21
+ - **Model type:** Object Detection (multi-class)
22
+ - **License:** Apache-2.0
23
+ - **Finetuned from model:** openmmlab/cascade-rcnn
24
+
25
+ ### Model Sources
26
+
27
+ - **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense)
28
+ - **Paper:** https://arxiv.org/abs/2106.01841
29
+
30
+ ## Uses
31
+
32
+ ### Direct Use
33
+
34
+ - Detection and localization of chart elements in scientific figures
35
+ - Preprocessing for downstream chart understanding and data extraction
36
+ - Automated annotation and analysis of scientific figures
37
+
38
+ ### Downstream Use
39
+
40
+ - As a preprocessing step for chart structure parsing or data extraction
41
+ - Integration into document parsing, digital library, or accessibility systems
42
+
43
+ ### Out-of-Scope Use
44
+
45
+ - Detection of non-scientific or artistic elements
46
+ - Use on figures outside the supported element classes
47
+ - Medical or legal decision making
48
+
49
+ ## Bias, Risks, and Limitations
50
+
51
+ - The model is limited to the chart element classes present in the training data (see below).
52
+ - May not generalize to figures with highly unusual styles or poor image quality.
53
+ - Potential dataset bias: Training data is sourced from scientific literature.
54
+
55
+ ### Recommendations
56
+
57
+ Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.
58
+
59
+ ## How to Get Started with the Model
60
+
61
+ ```python
62
+ import torch
63
+ from mmdet.apis import inference_detector, init_detector
64
+
65
+ config_file = 'legend_match_swin/cascade_rcnn_r50_fpn_meta.py'
66
+ checkpoint_file = 'chart_label+.pth'
67
+ model = init_detector(config_file, checkpoint_file, device='cuda:0')
68
+
69
+ result = inference_detector(model, 'example_chart.png')
70
+ # result: list of detected bounding boxes and class labels
71
+ ```
72
+
73
+ ## Training Details
74
+
75
+ ### Training Data
76
+
77
+ - **Dataset:** Enhanced COCO-style scientific chart dataset
78
+ - 21+ chart element classes, including axes, legends, titles, tick labels, data lines, bars, etc.
79
+ - Rich metadata and bounding box annotations
80
+
81
+ ### Training Procedure
82
+
83
+ - Images resized to 1120x672
84
+ - Cascade R-CNN with Swin Transformer backbone
85
+ - **Training regime:** fp32
86
+ - **Optimizer:** AdamW
87
+ - **Batch size:** 8
88
+ - **Epochs:** 36
89
+ - **Learning rate:** 1e-4
90
+
91
+ ## Evaluation
92
+
93
+ ### Testing Data, Factors & Metrics
94
+
95
+ - **Testing Data:** Held-out split from enhanced COCO-style dataset
96
+ - **Factors:** Element class, image quality
97
+ - **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP
98
+
99
+ ### Results
100
+
101
+ | Category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
102
+ |-----------------|-------|--------|--------|-------|-------|-------|
103
+ | title | 0.837 | 0.988 | 0.957 | 0.283 | 0.775 | 0.897 |
104
+ | x-axis | 0.382 | 0.860 | 0.261 | 0.382 | nan | nan |
105
+ | y-axis | 0.475 | 0.949 | 0.404 | 0.475 | nan | nan |
106
+ | x-tick-label | 0.807 | 0.975 | 0.891 | 0.796 | 0.835 | 0.830 |
107
+ | y-tick-label | 0.785 | 0.976 | 0.893 | 0.786 | 0.632 | nan |
108
+ | data-line | 0.759 | 0.986 | 0.916 | nan | 0.492 | 0.760 |
109
+ | data-bar | 0.080 | 0.206 | 0.049 | 0.080 | nan | nan |
110
+ | axis-title | 0.818 | 0.988 | 0.935 | 0.826 | 0.811 | 0.492 |
111
+ | plot-area | 0.976 | 0.996 | 0.993 | nan | nan | 0.976 |
112
+
113
+ #### Summary
114
+
115
+ The model achieves high mAP for text and axis elements, moderate for lines and points, and lower for bars due to data scarcity. It demonstrates strong performance for most chart element classes in scientific figures.
116
+
117
+ ## Environmental Impact
118
+
119
+ - **Hardware Type:** NVIDIA V100 GPU
120
+ - **Hours used:** 12
121
+ - **Cloud Provider:** Google Cloud
122
+ - **Compute Region:** us-central1
123
+ - **Carbon Emitted:** ~18 kg CO2eq (estimated)
124
+
125
+ ## Technical Specifications
126
+
127
+ ### Model Architecture and Objective
128
+
129
+ - Cascade R-CNN with Swin Transformer backbone
130
+ - Multi-class object detection head for 21+ chart element classes
131
+
132
+ ### Compute Infrastructure
133
+
134
+ - **Hardware:** NVIDIA V100 GPU
135
+ - **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9
136
+
137
+ ## Citation
138
+
139
+ **BibTeX:**
140
+
141
+ ```bibtex
142
+ @article{DocFigure2021,
143
+ title={DocFigure: A Dataset for Scientific Figure Classification},
144
+ author={S. Afzal, et al.},
145
+ journal={arXiv preprint arXiv:2106.01841},
146
+ year={2021}
147
+ }
148
+ ```
149
+
150
+ **APA:**
151
+
152
+ Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.
153
+
154
+ ## Glossary
155
+
156
+ - **Chart Element:** Any visual component of a scientific figure (e.g., axis, legend, tick label, data line, etc.)
157
+
158
+ ## More Information
159
+
160
+ - [DocFigure Paper](https://arxiv.org/abs/2106.01841)
161
+
162
+ ## Model Card Authors
163
+
164
+ Hansheng Zhu
165
+
166
+ ## Model Card Contact
167
+
168