--- base_model: Kwaipilot/KAT-Dev-72B-Exp tags: - rust - Hyperswitch - LoRA - CPT - Fine-Tuned - Causal-LM pipeline_tag: text-generation language: - en datasets: - AdityaNarayan/HyperSwitch-Repo-CPT-Dataset --- # Kwaipilot-KAT-Dev-CPT-LoRA-Adapter-HyperSwitch A LoRA fine-tuned model based on **Kwaipilot/KAT-Dev-72B-Exp** specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices. ## 🎯 Model Description This LoRA adapter was trained on **16,731 samples** extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain. - **Base Model**: Kwaipilot/KAT-Dev-72B-Exp - **Training Type**: Causal Language Modeling (CLM) with LoRA - **Domain**: Payment Processing, Rust Development - **Specialization**: Hyperswitch codebase patterns and architecture ## 📊 Training Details ### Dataset Composition - **Total Samples**: 16,731 - **File-level samples**: 2,120 complete files - **Granular samples**: 14,611 extracted components - Functions: 4,121 - Structs: 5,710 - Traits: 223 - Implementations: 4,296 - Modules: 261 ### LoRA Configuration ```yaml r: 64 # LoRA rank alpha: 128 # LoRA alpha (2*r) dropout: 0.05 # LoRA dropout target_modules: # Applied to all linear layers - q_proj, k_proj, v_proj, o_proj - gate_proj, up_proj, down_proj ``` ### Training Hyperparameters - **Epochs**: 3 - **Learning Rate**: 5e-5 (cosine schedule) - **Max Context**: 8,192 tokens - **Hardware**: 4 x NVIDIA H200 ### Training Results ``` "final_train_loss": 0.2641, "final_eval_loss": 0.37574875354766846, "final_train_perplexity": 1.3022584156313823, "final_eval_perplexity": 1.4560812525608204, "final_token_accuracy": 0.9259863365441561, "initial_loss": 1.6648, "initial_perplexity": 5.284616220817229, "initial_accuracy": 0.6015806214883923 ``` ## 🚀 Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model base_model = AutoModelForCausalLM.from_pretrained( "Kwaipilot/KAT-Dev-72B-Exp", dtype=torch.bfloat16, device_map="auto" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch") # Generate code prompt = """// Hyperswitch payment processing pub fn validate_payment_method(""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.2, # Lower temperature for code generation do_sample=True, pad_token_id=tokenizer.eos_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Recommended Settings - **Temperature**: 0.2-0.3 for code generation - **Temperature**: 0.5-0.7 for explanations and documentation - **Max tokens**: 1024 for most tasks ## 🛠️ Technical Specifications - **Context Window**: 8,192 tokens - **Precision**: bfloat16 - **Memory Usage**: ~78GB VRAM (32B base model) - **Inference Speed**: Optimized with Flash Attention 2 ## 🙏 Acknowledgments - **Kwaipilot Team** for the excellent KAT-Dev base model - **Hyperswitch Team** for the open-source payment processing platform - **Hugging Face** for the transformers and PEFT libraries ## 📞 Citation ```bibtex @misc{hyperswitch-kat-dev-lora-2024, title={KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch}, author={Aditya Narayan}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch} } ```