| # Deployment Checklist - ZeroGPU Integration | |
| ## β Pre-Deployment Verification | |
| ### Code Status | |
| - β All code changes committed and pushed | |
| - β FAISS-GPU implementation complete | |
| - β Lazy-loaded local model fallback implemented | |
| - β ZeroGPU API integration complete | |
| - β Dockerfile configured correctly | |
| - β Requirements.txt updated with faiss-gpu | |
| ### Files Ready | |
| - β `Dockerfile` - Configured for HF Spaces | |
| - β `main.py` - Entry point for HF Spaces | |
| - β `requirements.txt` - All dependencies including faiss-gpu | |
| - β `README.md` - Contains HF Spaces configuration | |
| --- | |
| ## π Deployment Steps | |
| ### 1. Verify Repository Status | |
| ```bash | |
| git status # Should show clean or only documentation changes | |
| git log --oneline -5 # Verify recent commits are pushed | |
| ``` | |
| ### 2. Hugging Face Spaces Configuration | |
| #### Space Settings | |
| 1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant | |
| 2. Navigate to **Settings** β **Repository secrets** | |
| #### Required Environment Variables | |
| **Basic Configuration:** | |
| ```bash | |
| HF_TOKEN=your_huggingface_token_here | |
| ``` | |
| **ZeroGPU API Configuration (Optional - for Runpod integration):** | |
| **Option A: Service Account Mode** | |
| ```bash | |
| USE_ZERO_GPU=true | |
| ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net | |
| [email protected] | |
| ZERO_GPU_PASSWORD=your-password | |
| ``` | |
| **Option B: Per-User Mode (Multi-tenant)** | |
| ```bash | |
| USE_ZERO_GPU=true | |
| ZERO_GPU_PER_USER_MODE=true | |
| ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net | |
| [email protected] | |
| ZERO_GPU_ADMIN_PASSWORD=admin-password | |
| ``` | |
| **Note:** Runpod proxy URLs follow the format: `https://<pod-id>-8000.proxy.runpod.net` | |
| **Additional Optional Variables:** | |
| ```bash | |
| DB_PATH=sessions.db | |
| LOG_LEVEL=INFO | |
| MAX_WORKERS=4 | |
| ``` | |
| ### 3. Hardware Selection | |
| In HF Spaces Settings: | |
| - **GPU**: NVIDIA T4 Medium (recommended) | |
| - 24GB vRAM (sufficient for local model fallback) | |
| - 30GB RAM | |
| - 8 vCPU | |
| **Note:** With ZeroGPU API enabled, GPU is only needed for: | |
| - FAISS-GPU vector search (automatic CPU fallback if GPU unavailable) | |
| - Local model fallback (only loads if ZeroGPU fails) | |
| ### 4. Deployment Process | |
| **Automatic Deployment:** | |
| 1. Code is already pushed to `main` branch | |
| 2. HF Spaces will automatically: | |
| - Detect `sdk: docker` in README.md | |
| - Build Docker image from Dockerfile | |
| - Install dependencies from requirements.txt | |
| - Start application using `main.py` | |
| **Manual Trigger (if needed):** | |
| - Go to Space β Settings β Restart this Space | |
| ### 5. Monitor Deployment | |
| **Check Build Logs:** | |
| - Navigate to Space β Logs | |
| - Watch for: | |
| - β Docker build success | |
| - β Dependencies installed (including faiss-gpu) | |
| - β Application startup | |
| - β ZeroGPU client initialization (if configured) | |
| - β Local model loader initialized (as fallback) | |
| **Expected Startup Messages:** | |
| ``` | |
| β Local model loader initialized (models will load on-demand as fallback) | |
| β ZeroGPU API client initialized (service account mode) | |
| β FAISS GPU resources initialized | |
| β Application ready for launch | |
| ``` | |
| ### 6. Verify Deployment | |
| **Health Check:** | |
| - Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant` | |
| - Health endpoint: `/health` should return `{"status": "healthy"}` | |
| **Test ZeroGPU Integration:** | |
| 1. Send a test message through the UI | |
| 2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"` | |
| 3. Verify no local models are loaded (if ZeroGPU working) | |
| **Test Fallback:** | |
| 1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`) | |
| 2. Send a test message | |
| 3. Check logs for: `"Lazy loading local model {model_id} as fallback"` | |
| 4. Verify local model loads and works | |
| --- | |
| ## π Post-Deployment Verification | |
| ### 1. Check Application Status | |
| - [ ] Application loads without errors | |
| - [ ] UI is accessible | |
| - [ ] Health check endpoint responds | |
| ### 2. Verify ZeroGPU Integration | |
| - [ ] ZeroGPU client initializes (if configured) | |
| - [ ] API calls succeed | |
| - [ ] No local models loaded (if ZeroGPU working) | |
| - [ ] Usage statistics accessible (if per-user mode) | |
| ### 3. Verify FAISS-GPU | |
| - [ ] FAISS GPU resources initialize | |
| - [ ] Vector search works | |
| - [ ] Falls back to CPU if GPU unavailable | |
| ### 4. Verify Fallback Chain | |
| - [ ] ZeroGPU API tried first | |
| - [ ] Local models load only if ZeroGPU fails | |
| - [ ] HF Inference API used as final fallback | |
| ### 5. Monitor Resource Usage | |
| - [ ] GPU memory usage is low (if ZeroGPU working) | |
| - [ ] CPU usage is reasonable | |
| - [ ] No memory leaks | |
| --- | |
| ## π Troubleshooting | |
| ### Issue: Build Fails | |
| **Check:** | |
| - Dockerfile syntax is correct | |
| - Requirements.txt has all dependencies | |
| - Python 3.10 is available | |
| **Solution:** | |
| - Review build logs in HF Spaces | |
| - Test Docker build locally: `docker build -t test .` | |
| ### Issue: ZeroGPU Not Working | |
| **Check:** | |
| - Environment variables are set correctly | |
| - ZeroGPU API is accessible from HF Spaces | |
| - Network connectivity to Runpod | |
| **Solution:** | |
| - Verify API URL is correct | |
| - Check credentials are valid | |
| - Review ZeroGPU API logs | |
| ### Issue: FAISS-GPU Not Available | |
| **Check:** | |
| - GPU is available in HF Spaces | |
| - faiss-gpu package installed correctly | |
| **Solution:** | |
| - System will automatically fall back to CPU | |
| - Check logs for: `"FAISS GPU not available, using CPU"` | |
| ### Issue: Local Models Not Loading | |
| **Check:** | |
| - `use_local_models=True` in code | |
| - Transformers/torch available | |
| - GPU memory sufficient | |
| **Solution:** | |
| - Check logs for initialization errors | |
| - Verify GPU availability | |
| - Models will only load if ZeroGPU fails | |
| --- | |
| ## π Expected Resource Usage | |
| ### With ZeroGPU API Enabled (Optimal) | |
| - **GPU Memory**: ~0-500MB (FAISS-GPU only, no local models) | |
| - **CPU**: Low (API calls only) | |
| - **RAM**: ~2-4GB (application + caching) | |
| ### With ZeroGPU Failing (Fallback Active) | |
| - **GPU Memory**: ~15GB (local models loaded) | |
| - **CPU**: Medium (model inference) | |
| - **RAM**: ~4-6GB (models + application) | |
| ### FAISS-GPU Usage | |
| - **GPU Memory**: ~100-500MB (depending on index size) | |
| - **CPU Fallback**: Automatic if GPU unavailable | |
| --- | |
| ## β Deployment Complete | |
| Once all checks pass: | |
| - β Application is live | |
| - β ZeroGPU integration working | |
| - β FAISS-GPU accelerated | |
| - β Fallback chain operational | |
| - β Monitoring in place | |
| **Next Steps:** | |
| - Monitor usage statistics | |
| - Review ZeroGPU API logs | |
| - Optimize based on usage patterns | |
| - Scale as needed | |
| --- | |
| **Last Updated:** 2025-01-07 | |
| **Deployment Status:** Ready | |
| **Version:** With ZeroGPU Integration + FAISS-GPU + Lazy Loading | |