Model Card for Model ID
This modelcard aims to be fine tuned model in legal domain of the Indian Judicial.
Model Details
Model Description
- Developed by: Nita Jadav
- Funded by [optional]: SVNIT, Surat
- Shared by [optional]: [More Information Needed]
- Model type: [More Information Needed]
- Language(s) (NLP): [More Information Needed]
- License: [More Information Needed]
- Finetuned from model [optional]: nsi319/legal-pegasus
Model Sources [optional]
- Repository: https://github.com/nitajadav8/Legal_TextSumm
- Paper [optional]: [More Information Needed]
- Demo [optional]:
Uses
The Model can be directly used to generate summary of legal documents.
Out-of-Scope Use
The model may not work for language other than english and other non-Indian legal domain.
Limitations
The model may not work for long documents. Long documents need to be chunked using passage retrieval methods described in github repository.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
How to Get Started with the Model
Use the code below to get started with the model.
#!/usr/bin/env python3
"""
import argparse
import json
import pandas as pd
from tqdm import tqdm
from transformers import pipeline
def load_jsonl(path):
rows = []
with open(path, "r", encoding="utf-8") as f:
for line in f:
rows.append(json.loads(line))
return pd.DataFrame(rows)
def save_jsonl(records, path):
with open(path, "w", encoding="utf-8") as f:
for r in records:
f.write(json.dumps(r, ensure_ascii=False) + "\n")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pegasus summarization using transformers.pipeline")
parser.add_argument("--input", required=True, help="Input .jsonl or .csv")
parser.add_argument("--output", required=True, help="Output JSONL file")
parser.add_argument("--model", required=True,
help="HF repo id or local model folder containing fine tuned legal-Pegasus")
parser.add_argument("--use_auth_token", default=None,
help="HF token for private models")
args = parser.parse_args()
print("Loading summarization pipeline...")
summarizer = pipeline(
"summarization",
model=args.model,
tokenizer=args.model, # tokenizer usually stored in same repo
use_auth_token=args.use_auth_token,
device=-1, # CPU; change to 0 for GPU
framework="pt"
)
print("Loading dataset...")
if args.input.endswith(".jsonl"):
df = load_jsonl(args.input)
else:
df = pd.read_csv(args.input)
if "ID" not in df.columns or "para_text" not in df.columns:
raise ValueError("Input must contain columns: ID, para_text")
results = []
print("Running summarization...")
for _, row in tqdm(df.iterrows(), total=len(df)):
text = row["para_text"]
summary = summarizer(
text,
max_length=256,
min_length=20,
do_sample=False,
num_beams=5
)[0]["summary_text"]
results.append({
"ID": str(row["ID"]),
"Summary": summary
})
save_jsonl(results, args.output)
print(f"Summaries saved to {args.output}")
Preprocessing [optional]
Long Documents required chunking using passage retrieval methods and raw data should be normalized.
Metrics
Rouge Score
Results
21.30 (R2,RL,BLEU Average)
Compute Infrastructure
Model trained with one H100 NVIDIA GPU with 94GB RAM
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nitajadav/Legal_Pegasus_FineTuned
Base model
nsi319/legal-pegasus