Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

App Files Files Community

Research_AI_Assistant / DEPLOYMENT_CHECKLIST.md

JatsTheAIGen

feat: Add URL validation and update to verified Runpod URL

e716c29 about 1 month ago

preview code

raw

history blame contribute delete

6.48 kB

	# Deployment Checklist - ZeroGPU Integration

	## ✅ Pre-Deployment Verification

	### Code Status
	- ✅ All code changes committed and pushed
	- ✅ FAISS-GPU implementation complete
	- ✅ Lazy-loaded local model fallback implemented
	- ✅ ZeroGPU API integration complete
	- ✅ Dockerfile configured correctly
	- ✅ Requirements.txt updated with faiss-gpu

	### Files Ready
	- ✅ `Dockerfile` - Configured for HF Spaces
	- ✅ `main.py` - Entry point for HF Spaces
	- ✅ `requirements.txt` - All dependencies including faiss-gpu
	- ✅ `README.md` - Contains HF Spaces configuration

	---

	## 🚀 Deployment Steps

	### 1. Verify Repository Status
	```bash
	git status # Should show clean or only documentation changes
	git log --oneline -5 # Verify recent commits are pushed
	```

	### 2. Hugging Face Spaces Configuration

	#### Space Settings
	1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
	2. Navigate to Settings → Repository secrets

	#### Required Environment Variables

	Basic Configuration:
	```bash
	HF_TOKEN=your_huggingface_token_here
	```

	ZeroGPU API Configuration (Optional - for Runpod integration):

	Option A: Service Account Mode
	```bash
	USE_ZERO_GPU=true
	ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
	[email protected]
	ZERO_GPU_PASSWORD=your-password
	```

	Option B: Per-User Mode (Multi-tenant)
	```bash
	USE_ZERO_GPU=true
	ZERO_GPU_PER_USER_MODE=true
	ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
	[email protected]
	ZERO_GPU_ADMIN_PASSWORD=admin-password
	```

	Note: Runpod proxy URLs follow the format: `https://<pod-id>-8000.proxy.runpod.net`

	Additional Optional Variables:
	```bash
	DB_PATH=sessions.db
	LOG_LEVEL=INFO
	MAX_WORKERS=4
	```

	### 3. Hardware Selection

	In HF Spaces Settings:
	- GPU: NVIDIA T4 Medium (recommended)
	- 24GB vRAM (sufficient for local model fallback)
	- 30GB RAM
	- 8 vCPU

	Note: With ZeroGPU API enabled, GPU is only needed for:
	- FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
	- Local model fallback (only loads if ZeroGPU fails)

	### 4. Deployment Process

	Automatic Deployment:
	1. Code is already pushed to `main` branch
	2. HF Spaces will automatically:
	- Detect `sdk: docker` in README.md
	- Build Docker image from Dockerfile
	- Install dependencies from requirements.txt
	- Start application using `main.py`

	Manual Trigger (if needed):
	- Go to Space → Settings → Restart this Space

	### 5. Monitor Deployment

	Check Build Logs:
	- Navigate to Space → Logs
	- Watch for:
	- ✅ Docker build success
	- ✅ Dependencies installed (including faiss-gpu)
	- ✅ Application startup
	- ✅ ZeroGPU client initialization (if configured)
	- ✅ Local model loader initialized (as fallback)

	Expected Startup Messages:
	```
	✓ Local model loader initialized (models will load on-demand as fallback)
	✓ ZeroGPU API client initialized (service account mode)
	✓ FAISS GPU resources initialized
	✓ Application ready for launch
	```

	### 6. Verify Deployment

	Health Check:
	- Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant`
	- Health endpoint: `/health` should return `{"status": "healthy"}`

	Test ZeroGPU Integration:
	1. Send a test message through the UI
	2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"`
	3. Verify no local models are loaded (if ZeroGPU working)

	Test Fallback:
	1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`)
	2. Send a test message
	3. Check logs for: `"Lazy loading local model {model_id} as fallback"`
	4. Verify local model loads and works

	---

	## 🔍 Post-Deployment Verification

	### 1. Check Application Status
	- [ ] Application loads without errors
	- [ ] UI is accessible
	- [ ] Health check endpoint responds

	### 2. Verify ZeroGPU Integration
	- [ ] ZeroGPU client initializes (if configured)
	- [ ] API calls succeed
	- [ ] No local models loaded (if ZeroGPU working)
	- [ ] Usage statistics accessible (if per-user mode)

	### 3. Verify FAISS-GPU
	- [ ] FAISS GPU resources initialize
	- [ ] Vector search works
	- [ ] Falls back to CPU if GPU unavailable

	### 4. Verify Fallback Chain
	- [ ] ZeroGPU API tried first
	- [ ] Local models load only if ZeroGPU fails
	- [ ] HF Inference API used as final fallback

	### 5. Monitor Resource Usage
	- [ ] GPU memory usage is low (if ZeroGPU working)
	- [ ] CPU usage is reasonable
	- [ ] No memory leaks

	---

	## 🐛 Troubleshooting

	### Issue: Build Fails
	Check:
	- Dockerfile syntax is correct
	- Requirements.txt has all dependencies
	- Python 3.10 is available

	Solution:
	- Review build logs in HF Spaces
	- Test Docker build locally: `docker build -t test .`

	### Issue: ZeroGPU Not Working
	Check:
	- Environment variables are set correctly
	- ZeroGPU API is accessible from HF Spaces
	- Network connectivity to Runpod

	Solution:
	- Verify API URL is correct
	- Check credentials are valid
	- Review ZeroGPU API logs

	### Issue: FAISS-GPU Not Available
	Check:
	- GPU is available in HF Spaces
	- faiss-gpu package installed correctly

	Solution:
	- System will automatically fall back to CPU
	- Check logs for: `"FAISS GPU not available, using CPU"`

	### Issue: Local Models Not Loading
	Check:
	- `use_local_models=True` in code
	- Transformers/torch available
	- GPU memory sufficient

	Solution:
	- Check logs for initialization errors
	- Verify GPU availability
	- Models will only load if ZeroGPU fails

	---

	## 📊 Expected Resource Usage

	### With ZeroGPU API Enabled (Optimal)
	- GPU Memory: ~0-500MB (FAISS-GPU only, no local models)
	- CPU: Low (API calls only)
	- RAM: ~2-4GB (application + caching)

	### With ZeroGPU Failing (Fallback Active)
	- GPU Memory: ~15GB (local models loaded)
	- CPU: Medium (model inference)
	- RAM: ~4-6GB (models + application)

	### FAISS-GPU Usage
	- GPU Memory: ~100-500MB (depending on index size)
	- CPU Fallback: Automatic if GPU unavailable

	---

	## ✅ Deployment Complete

	Once all checks pass:
	- ✅ Application is live
	- ✅ ZeroGPU integration working
	- ✅ FAISS-GPU accelerated
	- ✅ Fallback chain operational
	- ✅ Monitoring in place

	Next Steps:
	- Monitor usage statistics
	- Review ZeroGPU API logs
	- Optimize based on usage patterns
	- Scale as needed

	---

	Last Updated: 2025-01-07
	Deployment Status: Ready
	Version: With ZeroGPU Integration + FAISS-GPU + Lazy Loading