Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

HF_TOKEN=your_huggingface_token_here
HF_HOME=/tmp/huggingface
MAX_WORKERS=2
CACHE_TTL=3600
DB_PATH=sessions.db
FAISS_INDEX_PATH=embeddings.faiss
SESSION_TIMEOUT=3600
MAX_SESSION_SIZE_MB=10
MOBILE_MAX_TOKENS=800
MOBILE_TIMEOUT=15000
GRADIO_PORT=7860
GRADIO_HOST=0.0.0.0
LOG_LEVEL=INFO

Space Configuration

Create a README.md in the HF Space with:

---
title: AI Research Assistant MVP
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
---

Deployment Steps

Clone/Setup Repository

git clone your-repo
cd Research_Assistant

Install Dependencies

bash install.sh
# or
pip install -r requirements.txt

Test Installation

python test_setup.py
# or
bash quick_test.sh

Run Locally
```
python app.py
```
Deploy to HF Spaces
- Push to GitHub
- Connect to HF Spaces
- Select ZeroGPU hardware
- Deploy

Resource Management

Memory Limits

Base Python: ~100MB
Gradio: ~50MB
Models (loaded): ~200-500MB
Cache: ~100MB max
Buffer: ~100MB

Total Budget: ~512MB (within HF Spaces limits)

Strategies

Lazy model loading
Model offloading when not in use
Aggressive cache eviction
Stream responses to reduce memory

Performance Optimization

For ZeroGPU

Use HF Inference API for LLM calls (not local models)
Use sentence-transformers for embeddings (lightweight)
Implement request queuing
Use FAISS-CPU (not GPU version)
Implement response streaming

Mobile Optimizations

Reduce max tokens to 800
Shorten timeout to 15s
Implement progressive loading
Use touch-optimized UI

Monitoring

Health Checks

Application health endpoint: /health
Database connectivity check
Cache hit rate monitoring
Response time tracking

Logging

Use structured logging (structlog)
Log levels: DEBUG (dev), INFO (prod)
Monitor error rates
Track performance metrics

Troubleshooting

Common Issues

Issue: Out of memory errors

Solution: Reduce max_workers, implement request queuing

Issue: Slow responses

Solution: Enable aggressive caching, use streaming

Issue: Model loading failures

Solution: Use HF Inference API instead of local models

Issue: Session data loss

Solution: Implement proper persistence with SQLite backup

Scaling Considerations

For Production

Horizontal Scaling: Deploy multiple instances
Caching Layer: Add Redis for shared session data
Load Balancing: Use HF Spaces built-in load balancer
CDN: Static assets via CDN
Database: Consider PostgreSQL for production

Migration Path

Phase 1: MVP on ZeroGPU (current)
Phase 2: Upgrade to GPU for local models
Phase 3: Scale to multiple workers
Phase 4: Enterprise deployment with managed infrastructure