Safetensors
English
pegasus

Model Card for Model ID

This modelcard aims to be fine tuned model in legal domain of the Indian Judicial.

Model Details

Model Description

  • Developed by: Nita Jadav
  • Funded by [optional]: SVNIT, Surat
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: nsi319/legal-pegasus

Model Sources [optional]

Uses

The Model can be directly used to generate summary of legal documents.

Out-of-Scope Use

The model may not work for language other than english and other non-Indian legal domain.

Limitations

The model may not work for long documents. Long documents need to be chunked using passage retrieval methods described in github repository.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

How to Get Started with the Model

Use the code below to get started with the model.

#!/usr/bin/env python3
"""

import argparse
import json
import pandas as pd
from tqdm import tqdm
from transformers import pipeline


def load_jsonl(path):
    rows = []
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            rows.append(json.loads(line))
    return pd.DataFrame(rows)


def save_jsonl(records, path):
    with open(path, "w", encoding="utf-8") as f:
        for r in records:
            f.write(json.dumps(r, ensure_ascii=False) + "\n")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Pegasus summarization using transformers.pipeline")
    parser.add_argument("--input", required=True, help="Input .jsonl or .csv")
    parser.add_argument("--output", required=True, help="Output JSONL file")
    parser.add_argument("--model", required=True,
                        help="HF repo id or local model folder containing fine tuned legal-Pegasus")
    parser.add_argument("--use_auth_token", default=None,
                        help="HF token for private models")

    args = parser.parse_args()

    print("Loading summarization pipeline...")
    summarizer = pipeline(
        "summarization",
        model=args.model,
        tokenizer=args.model,           # tokenizer usually stored in same repo
        use_auth_token=args.use_auth_token,
        device=-1,                      # CPU; change to 0 for GPU
        framework="pt"
    )

    print("Loading dataset...")
    if args.input.endswith(".jsonl"):
        df = load_jsonl(args.input)
    else:
        df = pd.read_csv(args.input)

    if "ID" not in df.columns or "para_text" not in df.columns:
        raise ValueError("Input must contain columns: ID, para_text")

    results = []

    print("Running summarization...")
    for _, row in tqdm(df.iterrows(), total=len(df)):
        text = row["para_text"]

        summary = summarizer(
            text,
            max_length=256,
            min_length=20,
            do_sample=False,
            num_beams=5
        )[0]["summary_text"]

        results.append({
            "ID": str(row["ID"]),
            "Summary": summary
        })

    save_jsonl(results, args.output)
    print(f"Summaries saved to {args.output}")

Preprocessing [optional]

Long Documents required chunking using passage retrieval methods and raw data should be normalized.

Metrics

Rouge Score

Results

21.30 (R2,RL,BLEU Average)

Compute Infrastructure

Model trained with one H100 NVIDIA GPU with 94GB RAM

Downloads last month
5
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nitajadav/Legal_Pegasus_FineTuned

Finetuned
(10)
this model

Datasets used to train nitajadav/Legal_Pegasus_FineTuned