File size: 6,483 Bytes
c279015 e716c29 c279015 e716c29 c279015 e716c29 c279015 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
# Deployment Checklist - ZeroGPU Integration
## β
Pre-Deployment Verification
### Code Status
- β
All code changes committed and pushed
- β
FAISS-GPU implementation complete
- β
Lazy-loaded local model fallback implemented
- β
ZeroGPU API integration complete
- β
Dockerfile configured correctly
- β
Requirements.txt updated with faiss-gpu
### Files Ready
- β
`Dockerfile` - Configured for HF Spaces
- β
`main.py` - Entry point for HF Spaces
- β
`requirements.txt` - All dependencies including faiss-gpu
- β
`README.md` - Contains HF Spaces configuration
---
## π Deployment Steps
### 1. Verify Repository Status
```bash
git status # Should show clean or only documentation changes
git log --oneline -5 # Verify recent commits are pushed
```
### 2. Hugging Face Spaces Configuration
#### Space Settings
1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
2. Navigate to **Settings** β **Repository secrets**
#### Required Environment Variables
**Basic Configuration:**
```bash
HF_TOKEN=your_huggingface_token_here
```
**ZeroGPU API Configuration (Optional - for Runpod integration):**
**Option A: Service Account Mode**
```bash
USE_ZERO_GPU=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
[email protected]
ZERO_GPU_PASSWORD=your-password
```
**Option B: Per-User Mode (Multi-tenant)**
```bash
USE_ZERO_GPU=true
ZERO_GPU_PER_USER_MODE=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
[email protected]
ZERO_GPU_ADMIN_PASSWORD=admin-password
```
**Note:** Runpod proxy URLs follow the format: `https://<pod-id>-8000.proxy.runpod.net`
**Additional Optional Variables:**
```bash
DB_PATH=sessions.db
LOG_LEVEL=INFO
MAX_WORKERS=4
```
### 3. Hardware Selection
In HF Spaces Settings:
- **GPU**: NVIDIA T4 Medium (recommended)
- 24GB vRAM (sufficient for local model fallback)
- 30GB RAM
- 8 vCPU
**Note:** With ZeroGPU API enabled, GPU is only needed for:
- FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
- Local model fallback (only loads if ZeroGPU fails)
### 4. Deployment Process
**Automatic Deployment:**
1. Code is already pushed to `main` branch
2. HF Spaces will automatically:
- Detect `sdk: docker` in README.md
- Build Docker image from Dockerfile
- Install dependencies from requirements.txt
- Start application using `main.py`
**Manual Trigger (if needed):**
- Go to Space β Settings β Restart this Space
### 5. Monitor Deployment
**Check Build Logs:**
- Navigate to Space β Logs
- Watch for:
- β
Docker build success
- β
Dependencies installed (including faiss-gpu)
- β
Application startup
- β
ZeroGPU client initialization (if configured)
- β
Local model loader initialized (as fallback)
**Expected Startup Messages:**
```
β Local model loader initialized (models will load on-demand as fallback)
β ZeroGPU API client initialized (service account mode)
β FAISS GPU resources initialized
β Application ready for launch
```
### 6. Verify Deployment
**Health Check:**
- Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant`
- Health endpoint: `/health` should return `{"status": "healthy"}`
**Test ZeroGPU Integration:**
1. Send a test message through the UI
2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"`
3. Verify no local models are loaded (if ZeroGPU working)
**Test Fallback:**
1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`)
2. Send a test message
3. Check logs for: `"Lazy loading local model {model_id} as fallback"`
4. Verify local model loads and works
---
## π Post-Deployment Verification
### 1. Check Application Status
- [ ] Application loads without errors
- [ ] UI is accessible
- [ ] Health check endpoint responds
### 2. Verify ZeroGPU Integration
- [ ] ZeroGPU client initializes (if configured)
- [ ] API calls succeed
- [ ] No local models loaded (if ZeroGPU working)
- [ ] Usage statistics accessible (if per-user mode)
### 3. Verify FAISS-GPU
- [ ] FAISS GPU resources initialize
- [ ] Vector search works
- [ ] Falls back to CPU if GPU unavailable
### 4. Verify Fallback Chain
- [ ] ZeroGPU API tried first
- [ ] Local models load only if ZeroGPU fails
- [ ] HF Inference API used as final fallback
### 5. Monitor Resource Usage
- [ ] GPU memory usage is low (if ZeroGPU working)
- [ ] CPU usage is reasonable
- [ ] No memory leaks
---
## π Troubleshooting
### Issue: Build Fails
**Check:**
- Dockerfile syntax is correct
- Requirements.txt has all dependencies
- Python 3.10 is available
**Solution:**
- Review build logs in HF Spaces
- Test Docker build locally: `docker build -t test .`
### Issue: ZeroGPU Not Working
**Check:**
- Environment variables are set correctly
- ZeroGPU API is accessible from HF Spaces
- Network connectivity to Runpod
**Solution:**
- Verify API URL is correct
- Check credentials are valid
- Review ZeroGPU API logs
### Issue: FAISS-GPU Not Available
**Check:**
- GPU is available in HF Spaces
- faiss-gpu package installed correctly
**Solution:**
- System will automatically fall back to CPU
- Check logs for: `"FAISS GPU not available, using CPU"`
### Issue: Local Models Not Loading
**Check:**
- `use_local_models=True` in code
- Transformers/torch available
- GPU memory sufficient
**Solution:**
- Check logs for initialization errors
- Verify GPU availability
- Models will only load if ZeroGPU fails
---
## π Expected Resource Usage
### With ZeroGPU API Enabled (Optimal)
- **GPU Memory**: ~0-500MB (FAISS-GPU only, no local models)
- **CPU**: Low (API calls only)
- **RAM**: ~2-4GB (application + caching)
### With ZeroGPU Failing (Fallback Active)
- **GPU Memory**: ~15GB (local models loaded)
- **CPU**: Medium (model inference)
- **RAM**: ~4-6GB (models + application)
### FAISS-GPU Usage
- **GPU Memory**: ~100-500MB (depending on index size)
- **CPU Fallback**: Automatic if GPU unavailable
---
## β
Deployment Complete
Once all checks pass:
- β
Application is live
- β
ZeroGPU integration working
- β
FAISS-GPU accelerated
- β
Fallback chain operational
- β
Monitoring in place
**Next Steps:**
- Monitor usage statistics
- Review ZeroGPU API logs
- Optimize based on usage patterns
- Scale as needed
---
**Last Updated:** 2025-01-07
**Deployment Status:** Ready
**Version:** With ZeroGPU Integration + FAISS-GPU + Lazy Loading
|