add AIBOM

Dear model owner(s),
We are a group of researchers investigating the usefulness of sharing AIBOMs (Artificial Intelligence Bill of Materials) to document AI models – AIBOMs are machine-readable structured lists of components (e.g., datasets and models) used to enhance transparency in AI-model supply chains.

To pursue the above-mentioned objective, we identified popular models on HuggingFace and, based on your model card (and some configuration information available in HuggingFace), we generated your AIBOM according to the CyclonDX (v1.6) standard (see https://cyclonedx.org/docs/1.6/json/). AIBOMs are generated as JSON files by using the following open-source supporting tool: https://github.com/MSR4SBOM/ALOHA (technical details are available in the research paper: https://github.com/MSR4SBOM/ALOHA/blob/main/ALOHA.pdf).

The JSON file in this pull request is your AIBOM (see https://github.com/MSR4SBOM/ALOHA/blob/main/documentation.json for details on its structure).

Clearly, the submitted AIBOM matches the current model information, yet it can be easily regenerated when the model evolves, using the aforementioned AIBOM generator tool.

We open this pull request containing an AIBOM of your AI model, and hope it will be considered. We would also like to hear your opinion on the usefulness (or not) of AIBOM by answering a 3-minute anonymous survey: https://forms.gle/WGffSQD5dLoWttEe7.

Thanks in advance, and regards,
Riccardo D’Avino, Fatima Ahmed, Sabato Nocera, Simone Romano, Giuseppe Scanniello (University of Salerno, Italy),
Massimiliano Di Penta (University of Sannio, Italy),
The MSR4SBOM team

Files changed (1) hide show

ibm-granite_granite-8b-code-instruct-4k.json +501 -0

ibm-granite_granite-8b-code-instruct-4k.json ADDED Viewed

	@@ -0,0 +1,501 @@

+{
+    "bomFormat": "CycloneDX",
+    "specVersion": "1.6",
+    "serialNumber": "urn:uuid:f94081a4-fc43-4e54-be15-3e5b3dfbeb90",
+    "version": 1,
+    "metadata": {
+        "timestamp": "2025-06-05T09:35:27.961481+00:00",
+        "component": {
+            "type": "machine-learning-model",
+            "bom-ref": "ibm-granite/granite-8b-code-instruct-4k-c67be605-a8eb-581f-9110-6dd0b327cd4c",
+            "name": "ibm-granite/granite-8b-code-instruct-4k",
+            "externalReferences": [
+                {
+                    "url": "https://huggingface.co/ibm-granite/granite-8b-code-instruct-4k",
+                    "type": "documentation"
+                }
+            ],
+            "modelCard": {
+                "modelParameters": {
+                    "task": "text-generation",
+                    "architectureFamily": "llama",
+                    "modelArchitecture": "LlamaForCausalLM",
+                    "datasets": [
+                        {
+                            "ref": "bigcode/commitpackft-7dd931ed-c49b-58fe-b83e-9e9d6c32ee7b"
+                        },
+                        {
+                            "ref": "TIGER-Lab/MathInstruct-9d9c997d-f6c1-5029-96fd-6003c4f0ec06"
+                        },
+                        {
+                            "ref": "meta-math/MetaMathQA-c6cf810a-8b06-5552-a876-53681c5fe9a1"
+                        },
+                        {
+                            "ref": "glaiveai/glaive-code-assistant-v3-dc0b3ece-6570-57fe-97c1-2c2ed2ffa00b"
+                        },
+                        {
+                            "ref": "glaive-function-calling-v2-cccb9fa5-b073-586d-bdcc-c5d443666d2c"
+                        },
+                        {
+                            "ref": "bugdaryan/sql-create-context-instruction-1c30e492-175f-5b7d-bf80-c1e311da3f29"
+                        },
+                        {
+                            "ref": "garage-bAInd/Open-Platypus-9908ce92-f46e-5979-aefc-e191a9daa0c8"
+                        },
+                        {
+                            "ref": "nvidia/HelpSteer-0eaad2ad-905d-5f61-bfe7-77e9c4369146"
+                        }
+                    ]
+                },
+                "properties": [
+                    {
+                        "name": "library_name",
+                        "value": "transformers"
+                    },
+                    {
+                        "name": "base_model",
+                        "value": "ibm-granite/granite-8b-code-base-4k"
+                    }
+                ],
+                "quantitativeAnalysis": {
+                    "performanceMetrics": [
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 57.9
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 52.4
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 58.5
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 43.3
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 48.2
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 37.2
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 53
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 42.7
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 52.4
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 36.6
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 43.9
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 16.5
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 39.6
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 40.9
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 48.2
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 41.5
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 39
+                        },
+                        {
+                            "slice": "dataset: bigcode/humanevalpack",
+                            "type": "pass@1",
+                            "value": 32.9
+                        }
+                    ]
+                },
+                "consideration": {
+                    "useCases": "The model is designed to respond to coding related instructions and can be used to build coding assistants.<!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->"
+                }
+            },
+            "authors": [
+                {
+                    "name": "ibm-granite"
+                }
+            ],
+            "licenses": [
+                {
+                    "license": {
+                        "id": "Apache-2.0",
+                        "url": "https://spdx.org/licenses/Apache-2.0.html"
+                    }
+                }
+            ],
+            "description": "**Granite-8B-Code-Instruct-4K** is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.- **Developers:** IBM Research- **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)- **Paper:** [Granite Code Models: A Family of Open Foundation Models for Code Intelligence](https://arxiv.org/abs/2405.04324)- **Release Date**: May 6th, 2024- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).",
+            "tags": [
+                "transformers",
+                "safetensors",
+                "llama",
+                "text-generation",
+                "code",
+                "granite",
+                "conversational",
+                "dataset:bigcode/commitpackft",
+                "dataset:TIGER-Lab/MathInstruct",
+                "dataset:meta-math/MetaMathQA",
+                "dataset:glaiveai/glaive-code-assistant-v3",
+                "dataset:glaive-function-calling-v2",
+                "dataset:bugdaryan/sql-create-context-instruction",
+                "dataset:garage-bAInd/Open-Platypus",
+                "dataset:nvidia/HelpSteer",
+                "arxiv:2405.04324",
+                "base_model:ibm-granite/granite-8b-code-base-4k",
+                "base_model:finetune:ibm-granite/granite-8b-code-base-4k",
+                "license:apache-2.0",
+                "model-index",
+                "autotrain_compatible",
+                "text-generation-inference",
+                "region:us"
+            ]
+        }
+    },
+    "components": [
+        {
+            "type": "data",
+            "bom-ref": "bigcode/commitpackft-7dd931ed-c49b-58fe-b83e-9e9d6c32ee7b",
+            "name": "bigcode/commitpackft",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "bigcode/commitpackft-7dd931ed-c49b-58fe-b83e-9e9d6c32ee7b",
+                    "name": "bigcode/commitpackft",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/bigcode/commitpackft",
+                        "properties": [
+                            {
+                                "name": "language",
+                                "value": "code"
+                            },
+                            {
+                                "name": "pretty_name",
+                                "value": "CommitPackFT"
+                            },
+                            {
+                                "name": "license",
+                                "value": "mit"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "bigcode",
+                                    "url": "https://huggingface.co/bigcode"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "CommitPackFT is is a 2GB filtered version of CommitPack to contain only high-quality commit messages that resemble natural language instructions."
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "TIGER-Lab/MathInstruct-9d9c997d-f6c1-5029-96fd-6003c4f0ec06",
+            "name": "TIGER-Lab/MathInstruct",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "TIGER-Lab/MathInstruct-9d9c997d-f6c1-5029-96fd-6003c4f0ec06",
+                    "name": "TIGER-Lab/MathInstruct",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/TIGER-Lab/MathInstruct",
+                        "properties": [
+                            {
+                                "name": "task_categories",
+                                "value": "text-generation"
+                            },
+                            {
+                                "name": "language",
+                                "value": "en"
+                            },
+                            {
+                                "name": "size_categories",
+                                "value": "100K<n<1M"
+                            },
+                            {
+                                "name": "pretty_name",
+                                "value": "MathInstruct"
+                            },
+                            {
+                                "name": "license",
+                                "value": "mit"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "TIGER-Lab",
+                                    "url": "https://huggingface.co/TIGER-Lab"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "\n\t\n\t\t\n\t\t\ud83e\udda3 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning\n\t\n\nMathInstruct is a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields. \nProject Page:\u2026 See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MathInstruct."
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "meta-math/MetaMathQA-c6cf810a-8b06-5552-a876-53681c5fe9a1",
+            "name": "meta-math/MetaMathQA",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "meta-math/MetaMathQA-c6cf810a-8b06-5552-a876-53681c5fe9a1",
+                    "name": "meta-math/MetaMathQA",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/meta-math/MetaMathQA",
+                        "properties": [
+                            {
+                                "name": "license",
+                                "value": "mit"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "meta-math",
+                                    "url": "https://huggingface.co/meta-math"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "View the project page:\nhttps://meta-math.github.io/\nsee our paper at https://arxiv.org/abs/2309.12284\n\n\t\n\t\t\n\t\tNote\n\t\n\nAll MetaMathQA data are augmented from the training sets of GSM8K and MATH. \nNone of the augmented data is from the testing set.\nYou can check the original_question in meta-math/MetaMathQA, each item is from the GSM8K or MATH train set.\n\n\t\n\t\t\n\t\tModel Details\n\t\n\nMetaMath-Mistral-7B is fully fine-tuned on the MetaMathQA datasets and based on the powerful Mistral-7B model. It is\u2026 See the full description on the dataset page: https://huggingface.co/datasets/meta-math/MetaMathQA."
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "glaiveai/glaive-code-assistant-v3-dc0b3ece-6570-57fe-97c1-2c2ed2ffa00b",
+            "name": "glaiveai/glaive-code-assistant-v3",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "glaiveai/glaive-code-assistant-v3-dc0b3ece-6570-57fe-97c1-2c2ed2ffa00b",
+                    "name": "glaiveai/glaive-code-assistant-v3",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3",
+                        "properties": [
+                            {
+                                "name": "size_categories",
+                                "value": "100K<n<1M"
+                            },
+                            {
+                                "name": "license",
+                                "value": "apache-2.0"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "glaiveai",
+                                    "url": "https://huggingface.co/glaiveai"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "\n\t\n\t\t\n\t\tGlaive-code-assistant-v3\n\t\n\nGlaive-code-assistant-v3 is a dataset of ~1M code problems and solutions generated using Glaive\u2019s synthetic data generation platform.\nThis is built on top of the previous version of the dataset that can be found here. This already includes v1 and v2 of the dataset.\nTo report any problems or suggestions in the data, join the Glaive discord\n"
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "glaive-function-calling-v2-cccb9fa5-b073-586d-bdcc-c5d443666d2c",
+            "name": "glaive-function-calling-v2",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "glaive-function-calling-v2-cccb9fa5-b073-586d-bdcc-c5d443666d2c",
+                    "name": "glaive-function-calling-v2"
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "bugdaryan/sql-create-context-instruction-1c30e492-175f-5b7d-bf80-c1e311da3f29",
+            "name": "bugdaryan/sql-create-context-instruction",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "bugdaryan/sql-create-context-instruction-1c30e492-175f-5b7d-bf80-c1e311da3f29",
+                    "name": "bugdaryan/sql-create-context-instruction",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/bugdaryan/sql-create-context-instruction",
+                        "properties": [
+                            {
+                                "name": "task_categories",
+                                "value": "text-generation, question-answering, table-question-answering"
+                            },
+                            {
+                                "name": "language",
+                                "value": "en"
+                            },
+                            {
+                                "name": "size_categories",
+                                "value": "10K<n<100K"
+                            },
+                            {
+                                "name": "pretty_name",
+                                "value": "sql-create-context"
+                            },
+                            {
+                                "name": "license",
+                                "value": "cc-by-4.0"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "bugdaryan",
+                                    "url": "https://huggingface.co/bugdaryan"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "\n\t\n\t\t\n\t\tOverview\n\t\n\nThis dataset is built upon SQL Create Context, which in turn was constructed using data from WikiSQL and Spider.\nThere are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-SQL LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-SQL datasets. The CREATE TABLE statement can often be\u2026 See the full description on the dataset page: https://huggingface.co/datasets/bugdaryan/sql-create-context-instruction."
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "garage-bAInd/Open-Platypus-9908ce92-f46e-5979-aefc-e191a9daa0c8",
+            "name": "garage-bAInd/Open-Platypus",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "garage-bAInd/Open-Platypus-9908ce92-f46e-5979-aefc-e191a9daa0c8",
+                    "name": "garage-bAInd/Open-Platypus",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/garage-bAInd/Open-Platypus",
+                        "properties": [
+                            {
+                                "name": "language",
+                                "value": "en"
+                            },
+                            {
+                                "name": "size_categories",
+                                "value": "10K<n<100K"
+                            },
+                            {
+                                "name": "configs",
+                                "value": "Name of the dataset subset: default {\"split\": \"train\", \"path\": \"data/train-*\"}"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "garage-bAInd",
+                                    "url": "https://huggingface.co/garage-bAInd"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "\n\t\n\t\t\n\t\tOpen-Platypus\n\t\n\nThis dataset is focused on improving LLM logical reasoning skills and was used to train the Platypus2 models. It is comprised of the following datasets, which were filtered using keyword search and then Sentence Transformers to remove questions with a similarity above 80%:\n\n\t\n\t\t\nDataset Name\nLicense Type\n\n\n\t\t\nPRM800K\nMIT\n\n\nMATH\nMIT\n\n\nScienceQA\nCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International\n\n\nSciBench\nMIT\n\n\nReClor\nNon-commercial\n\n\nTheoremQA\nMIT\u2026 See the full description on the dataset page: https://huggingface.co/datasets/garage-bAInd/Open-Platypus."
+                }
+            ]
+        },
+        {
+            "type": "data",
+            "bom-ref": "nvidia/HelpSteer-0eaad2ad-905d-5f61-bfe7-77e9c4369146",
+            "name": "nvidia/HelpSteer",
+            "data": [
+                {
+                    "type": "dataset",
+                    "bom-ref": "nvidia/HelpSteer-0eaad2ad-905d-5f61-bfe7-77e9c4369146",
+                    "name": "nvidia/HelpSteer",
+                    "contents": {
+                        "url": "https://huggingface.co/datasets/nvidia/HelpSteer",
+                        "properties": [
+                            {
+                                "name": "language",
+                                "value": "en"
+                            },
+                            {
+                                "name": "size_categories",
+                                "value": "10K<n<100K"
+                            },
+                            {
+                                "name": "pretty_name",
+                                "value": "Helpfulness SteerLM Dataset"
+                            },
+                            {
+                                "name": "license",
+                                "value": "cc-by-4.0"
+                            }
+                        ]
+                    },
+                    "governance": {
+                        "owners": [
+                            {
+                                "organization": {
+                                    "name": "nvidia",
+                                    "url": "https://huggingface.co/nvidia"
+                                }
+                            }
+                        ]
+                    },
+                    "description": "\n\t\n\t\t\n\t\tHelpSteer: Helpfulness SteerLM Dataset\n\t\n\nHelpSteer is an open-source Helpfulness Dataset (CC-BY-4.0) that supports aligning models to become more helpful, factually correct and coherent, while being adjustable in terms of the complexity and verbosity of its responses.\nLeveraging this dataset and SteerLM, we train a Llama 2 70B to reach 7.54 on MT Bench, the highest among models trained on open-source datasets based on MT Bench Leaderboard as of 15 Nov 2023.\nThis model is available on\u2026 See the full description on the dataset page: https://huggingface.co/datasets/nvidia/HelpSteer."
+                }
+            ]
+        }
+    ]
+}