AdityaNarayan commited on
Commit
557ca57
Β·
verified Β·
1 Parent(s): da6e25d

added README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Kwaipilot/KAT-Dev-72B-Exp
3
+ tags:
4
+ - rust
5
+ - Hyperswitch
6
+ - LoRA
7
+ - CPT
8
+ - Fine-Tuned
9
+ - Causal-LM
10
+ pipeline_tag: text-generation
11
+ language:
12
+ - en
13
+ datasets:
14
+ - AdityaNarayan/HyperSwitch-Repo-CPT-Dataset
15
+ ---
16
+ # Kwaipilot-KAT-Dev-CPT-LoRA-Adapter-HyperSwitch
17
+
18
+ A LoRA fine-tuned model based on **Kwaipilot/KAT-Dev-72B-Exp** specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices.
19
+
20
+ ## 🎯 Model Description
21
+
22
+ This LoRA adapter was trained on **16,731 samples** extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain.
23
+
24
+ - **Base Model**: Kwaipilot/KAT-Dev-72B-Exp
25
+ - **Training Type**: Causal Language Modeling (CLM) with LoRA
26
+ - **Domain**: Payment Processing, Rust Development
27
+ - **Specialization**: Hyperswitch codebase patterns and architecture
28
+
29
+ ## πŸ“Š Training Details
30
+
31
+ ### Dataset Composition
32
+ - **Total Samples**: 16,731
33
+ - **File-level samples**: 2,120 complete files
34
+ - **Granular samples**: 14,611 extracted components
35
+ - Functions: 4,121
36
+ - Structs: 5,710
37
+ - Traits: 223
38
+ - Implementations: 4,296
39
+ - Modules: 261
40
+
41
+ ### LoRA Configuration
42
+ ```yaml
43
+ r: 64 # LoRA rank
44
+ alpha: 128 # LoRA alpha (2*r)
45
+ dropout: 0.05 # LoRA dropout
46
+ target_modules: # Applied to all linear layers
47
+ - q_proj, k_proj, v_proj, o_proj
48
+ - gate_proj, up_proj, down_proj
49
+ ```
50
+
51
+ ### Training Hyperparameters
52
+ - **Epochs**: 2.3
53
+ - **Steps**: 550
54
+ - **Batch Size**: 2 per device (16 effective with gradient accumulation)
55
+ - **Learning Rate**: 5e-5 (cosine schedule)
56
+ - **Max Context**: 8,192 tokens
57
+ - **Hardware**: 2x NVIDIA H200 (80GB each)
58
+ - **Training Time**: ~4 hours (2,355 steps)
59
+
60
+ ### Training Results
61
+ ```
62
+ "final_train_loss": 0.2793,
63
+ "final_eval_loss": 0.3765236437320709,
64
+ "final_train_perplexity": 1.322203945559979,
65
+ "final_eval_perplexity": 1.457209992899547,
66
+ "final_token_accuracy": 0.9227368004620076,
67
+ "initial_loss": 1.6654,
68
+ "initial_perplexity": 5.2877879419709135,
69
+ "initial_accuracy": 0.6416946474462748
70
+ ```
71
+
72
+ ## πŸš€ Usage
73
+
74
+ ### Quick Start
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+ from peft import PeftModel
78
+ import torch
79
+ # Load base model
80
+ base_model = AutoModelForCausalLM.from_pretrained(
81
+ "Kwaipilot/KAT-Dev-72B-Exp",
82
+ dtype=torch.bfloat16,
83
+ device_map="auto"
84
+ )
85
+ # Load tokenizer
86
+ tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp")
87
+ # Load LoRA adapter
88
+ model = PeftModel.from_pretrained(base_model, "AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch")
89
+ # Generate code
90
+ prompt = """// Hyperswitch payment processing
91
+ pub fn validate_payment_method("""
92
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
93
+ outputs = model.generate(
94
+ **inputs,
95
+ max_new_tokens=200,
96
+ temperature=0.2, # Lower temperature for code generation
97
+ do_sample=True,
98
+ pad_token_id=tokenizer.eos_token_id
99
+ )
100
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
101
+ ```
102
+
103
+ ### Recommended Settings
104
+ - **Temperature**: 0.2-0.3 for code generation
105
+ - **Temperature**: 0.5-0.7 for explanations and documentation
106
+ - **Max tokens**: 1024 for most tasks
107
+
108
+ ## πŸ› οΈ Technical Specifications
109
+
110
+ - **Context Window**: 8,192 tokens
111
+ - **Precision**: bfloat16
112
+ - **Memory Usage**: ~78GB VRAM (32B base model)
113
+ - **Inference Speed**: Optimized with Flash Attention 2
114
+
115
+ ## πŸ™ Acknowledgments
116
+
117
+ - **Kwaipilot Team** for the excellent KAT-Dev base model
118
+ - **Hyperswitch Team** for the open-source payment processing platform
119
+ - **Hugging Face** for the transformers and PEFT libraries
120
+
121
+ ## πŸ“ž Citation
122
+
123
+ ```bibtex
124
+ @misc{hyperswitch-kat-dev-lora-2024,
125
+ title={KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch},
126
+ author={Aditya Narayan},
127
+ year={2024},
128
+ publisher={Hugging Face},
129
+ url={AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch}
130
+ }
131
+ ```