--- base_model: unsloth/meta-llama-3.1-8b-bnb-4bit pipeline_tag: text-generation tags: - text-generation - sql-generation - llama - lora - peft - unsloth - transformers license: apache-2.0 language: - en --- # SQL-Genie (LLaMA-3.1-8B Fine-Tuned) ## 🧠 Model Overview **SQL-Genie** is a fine-tuned version of **LLaMA-3.1-8B**, specialized for converting **natural language questions into SQL queries**. The model was trained using **parameter-efficient fine-tuning (LoRA)** on a structured SQL instruction dataset, enabling strong SQL generation performance while remaining lightweight and affordable to train on limited compute (Google Colab). - **Developed by:** dhashu - **Base model:** `unsloth/meta-llama-3.1-8b-bnb-4bit` - **License:** Apache-2.0 - **Training stack:** Unsloth + Hugging Face TRL --- ## ⚙️ Training Methodology This model was trained using **LoRA (Low-Rank Adaptation)** via the **PEFT** framework. ### Key Details - Base model loaded in **4-bit quantization** for memory efficiency - **Base weights frozen** - **LoRA adapters** applied to: - Attention layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`) - Feed-forward layers (`gate_proj`, `up_proj`, `down_proj`) - Fine-tuned using **Supervised Fine-Tuning (SFT)** This approach allows efficient specialization without full model retraining. --- ## 📊 Dataset The model was trained on a subset of the **`b-mc2/sql-create-context`** dataset, which includes: - Natural language questions - Database schema / context - Corresponding SQL queries Each sample was formatted as an **instruction-style prompt** to improve reasoning and structured output. --- ## 🚀 Performance & Efficiency - 🚀 **2× faster fine-tuning** using Unsloth - 💾 **Low VRAM usage** via 4-bit quantization - 🧠 Improved SQL syntax and schema understanding - ⚡ Suitable for real-time inference and lightweight deployments --- ## 🧩 Model Variants This repository contains a **merged model**: ### 🔹 Merged 4-bit Model - LoRA adapters merged into base weights - No PEFT required at inference time - Ready-to-use single checkpoint - Optimized for easy deployment --- ## ▶️ How to Use (Inference) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "dhashu/sql-genie-full" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", load_in_4bit=True, ) prompt = """Below is an input question, context is given to help. Generate a SQL response. ### Input: List all employees hired after 2020 ### Context: CREATE TABLE employees(id, name, hire_date) ### SQL Response: """ inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate( **inputs, max_new_tokens=128, temperature=0.7, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True))