GPT-2 Hinglish Fine-tuned
This is a GPT-2 model fine-tuned on a Hinglish conversational dataset. It is trained to generate Hinglish text and predict the next word in code-mixed scenarios.
Usage (Custom Inference)
This model predicts the first complete next word using the input context.
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch, re
def clean_generated_text(text):
    text = re.sub(r'^[^\w]+|[^\w]+$', '', text)
    return text.strip()
def predict_first_complete_word(input_text, max_new_tokens=10):
    tokenizer = GPT2Tokenizer.from_pretrained("SoorajK1/gpt2-hinglish-finetuned")
    model = GPT2LMHeadModel.from_pretrained("SoorajK1/gpt2-hinglish-finetuned")
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=max_new_tokens)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    continuation = generated_text[len(input_text):].lstrip()
    next_match = re.match(r'^(\w+)', continuation)
    return clean_generated_text(next_match.group(1) if next_match else "")
- Downloads last month
- 1