GPT-2 Hinglish Fine-tuned

This is a GPT-2 model fine-tuned on a Hinglish conversational dataset. It is trained to generate Hinglish text and predict the next word in code-mixed scenarios.

Usage (Custom Inference)

This model predicts the first complete next word using the input context.

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch, re

def clean_generated_text(text):
    text = re.sub(r'^[^\w]+|[^\w]+$', '', text)
    return text.strip()

def predict_first_complete_word(input_text, max_new_tokens=10):
    tokenizer = GPT2Tokenizer.from_pretrained("SoorajK1/gpt2-hinglish-finetuned")
    model = GPT2LMHeadModel.from_pretrained("SoorajK1/gpt2-hinglish-finetuned")
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=max_new_tokens)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    continuation = generated_text[len(input_text):].lstrip()
    next_match = re.match(r'^(\w+)', continuation)
    return clean_generated_text(next_match.group(1) if next_match else "")

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32