LiquidAI/LFM2-1.2B-Extract finetuned on Japanese content to output JSON with PII. The model should output only single json with four fields:

{
    "full_name": "name of the person",
    "company_name": "name of the company",
    "address": "address of the plance",
    "phone_number": "phone number"
}

Coded during Liquid AI hackathon in Tokyo.

Evaluations

Evaluation on test split of stockmark/ner-wikipedia-dataset:

  • Test accuracy on raw model using wiki dataset: 0.9100 --> 1.0 after fine-tunning.

That dataset is somehow simple. We generated 64 samples of long contracts containing PII in Japanese. We used it only to evaluate the final perfomrance of the models

  • Test accuracy on raw model using OUR dataset: 0.5781 --> 0.9688 after fine-tunning.

Evaluation methodology

We use an exact match on generated JSON. The output of SLM must be a valid JSON with exactly four required fields, no less, no more.

Model Details

Downloads last month
20
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kainoj/LiquidAI-LFM2-1.2B-Extract-ja-pii-finetuned

Base model

LiquidAI/LFM2-1.2B
Finetuned
(1)
this model

Dataset used to train kainoj/LiquidAI-LFM2-1.2B-Extract-ja-pii-finetuned