--- task_categories: - text-generation language: - en tags: - code - mermaid - syntax - diagram - repair pretty_name: Mermaid AI Syntax size_categories: - 100K **Note:** Validation is performed by the Mermaid parser **before** any model call. Parser diagnostics are exposed in the dataset as `compiler_errors` (array of strings) so the model can understand what failed and propose targeted repairs. --- ## Supported Tasks and Benchmarks - **Text Generation** - `REPAIR`: Given an invalid diagram and parser diagnostics (`compiler_errors`), generate a corrected diagram (or a minimal patch). - `TITLE`: Given a valid diagram, generate a short, human-friendly title (optionally with a one-sentence summary). - `GENERATE`: Given a natural language instruction and optional diagram type, generate a new valid diagram (`diagram_content`) plus optional title and summary. ### Task Categories - `text-generation` --- ## Languages - **English (`en`)** All error messages, titles, and instructions are in English. Future multilingual expansions may include localized error messages. --- ## Dataset Structure ### Input Schema ```json { "task": "REPAIR|TITLE|GENERATE", "input": { "diagram": "string (for REPAIR|TITLE)", "instruction": "string (for GENERATE)", "context": "optional string", "diagram_type": "optional string", "compiler_errors": ["string (for REPAIR)"] } } ``` `compiler_errors` is an optional array of strings produced by the Mermaid parser (e.g., `"MISSING_ARROW at line 7"`, `"UNTERMINATED_BLOCK: 'gantt' missing 'end'"`). Include it for `REPAIR` samples; omit it for `TITLE` and `GENERATE` samples. ### Output Schema ```json { "result": { "compiler_errors": ["string"], // optional echo of parser diagnostics "patch": [ // optional for REPAIR tasks { "op": "replace|insert|delete", "range": {"startLine": 1, "startCol": 5, "endLine": 1, "endCol": 10}, "text": "new content" } ], "repaired_diagram": "string or null", // for REPAIR "diagram_content": "string or null", // for GENERATE "title": "string or null", // for TITLE and GENERATE "summary": "string or null" // optional one-sentence description } } ``` - `compiler_errors`: optional echo of parser diagnostics to provide context for the model. - `patch`: optional list of minimal edit operations for REPAIR tasks. - `repaired_diagram`: the corrected diagram (full text), used in REPAIR tasks. - `diagram_content`: the newly generated diagram, used in GENERATE tasks. - `title`: a short, human-friendly title, used in TITLE and GENERATE tasks. - `summary`: an optional one-sentence description or summary, used in TITLE and GENERATE tasks. ### Examples #### Example REPAIR ```json { "task": "REPAIR", "input": { "diagram": "flowchart TD\nA --> B", "compiler_errors": ["MISSING_ARROW at line 2"] }, "result": { "compiler_errors": ["MISSING_ARROW at line 2"], "patch": [ { "op": "replace", "range": {"startLine": 2, "startCol": 5, "endLine": 2, "endCol": 7}, "text": "->" } ], "repaired_diagram": "flowchart TD\nA -> B", "title": null, "summary": null } } ``` #### Example TITLE ```json { "task": "TITLE", "input": { "diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!" }, "result": { "compiler_errors": [], "patch": [], "repaired_diagram": null, "title": "Alice greets Bob", "summary": "A simple sequence diagram showing Alice sending a greeting message to Bob." } } ``` #### Example GENERATE ```json { "task": "GENERATE", "input": { "instruction": "Create a flowchart for the checkout process", "diagram_type": "flowchart" }, "result": { "compiler_errors": [], "patch": [], "diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation", "title": "Checkout Flow", "summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process." } } ``` ## Sample Data An example of a `sample.jsonl` is included for each task type. Each line is a JSON object following the schema. ### REPAIR Sample ```jsonl {"task": "REPAIR", "input": {"diagram": "flowchart TD\nA -> B", "diagram_type": "flowchart", "compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"]}, "result": {"compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"], "patch": [{"op": "replace", "range": {"startLine": 2, "startCol": 3, "endLine": 2, "endCol": 4}, "text": "--"}], "repaired_diagram": "flowchart TD\nA --> B", "diagram_content": null, "title": null, "summary": null}} ``` ### TITLE Sample ```jsonl {"task": "TITLE", "input": {"diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!", "diagram_type": "sequence"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": null, "title": "Alice greets Bob", "summary": "A simple sequence diagram showing Alice sending a greeting message to Bob."}} ``` ### GENERATE Sample ```jsonl {"task": "GENERATE", "input": {"instruction": "Create a flowchart for the checkout process", "diagram_type": "flowchart"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation", "title": "Checkout Flow", "summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process."}} ``` Additional syntax-focused training samples have been generated from the Mermaid documentation and are available as JSONL files: - `data/syntax_repair_samples.jsonl` – contains REPAIR task samples with broken diagrams and their fixes. - `data/syntax_title_samples.jsonl` – contains TITLE task samples with valid diagrams, titles, and summaries. - `data/syntax_generate_samples.jsonl` – contains GENERATE task samples with instructions and generated diagrams. - `data/syntax_all_samples.jsonl` – combined file with all tasks. These files can be used to train models specifically on Mermaid syntax understanding, repair, and generation.