File size: 3,391 Bytes
66dbebd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# Deployment Notes

## Hugging Face Spaces Deployment

### ZeroGPU Configuration
This MVP is optimized for **ZeroGPU** deployment on Hugging Face Spaces.

#### Key Settings
- **GPU**: None (CPU-only)
- **Storage**: Limited (~20GB)
- **Memory**: 32GB RAM
- **Network**: Shared infrastructure

### Environment Variables
Required environment variables for deployment:

```bash
HF_TOKEN=your_huggingface_token_here
HF_HOME=/tmp/huggingface
MAX_WORKERS=2
CACHE_TTL=3600
DB_PATH=sessions.db
FAISS_INDEX_PATH=embeddings.faiss
SESSION_TIMEOUT=3600
MAX_SESSION_SIZE_MB=10
MOBILE_MAX_TOKENS=800
MOBILE_TIMEOUT=15000
GRADIO_PORT=7860
GRADIO_HOST=0.0.0.0
LOG_LEVEL=INFO
```

### Space Configuration
Create a `README.md` in the HF Space with:

```yaml
---
title: AI Research Assistant MVP
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
---
```

### Deployment Steps

1. **Clone/Setup Repository**
   ```bash
   git clone your-repo
   cd Research_Assistant
   ```

2. **Install Dependencies**
   ```bash
   bash install.sh
   # or
   pip install -r requirements.txt
   ```

3. **Test Installation**
   ```bash
   python test_setup.py
   # or
   bash quick_test.sh
   ```

4. **Run Locally**
   ```bash
   python app.py
   ```

5. **Deploy to HF Spaces**
   - Push to GitHub
   - Connect to HF Spaces
   - Select ZeroGPU hardware
   - Deploy

### Resource Management

#### Memory Limits
- **Base Python**: ~100MB
- **Gradio**: ~50MB
- **Models (loaded)**: ~200-500MB
- **Cache**: ~100MB max
- **Buffer**: ~100MB

**Total Budget**: ~512MB (within HF Spaces limits)

#### Strategies
- Lazy model loading
- Model offloading when not in use
- Aggressive cache eviction
- Stream responses to reduce memory

### Performance Optimization

#### For ZeroGPU
1. Use HF Inference API for LLM calls (not local models)
2. Use `sentence-transformers` for embeddings (lightweight)
3. Implement request queuing
4. Use FAISS-CPU (not GPU version)
5. Implement response streaming

#### Mobile Optimizations
- Reduce max tokens to 800
- Shorten timeout to 15s
- Implement progressive loading
- Use touch-optimized UI

### Monitoring

#### Health Checks
- Application health endpoint: `/health`
- Database connectivity check
- Cache hit rate monitoring
- Response time tracking

#### Logging
- Use structured logging (structlog)
- Log levels: DEBUG (dev), INFO (prod)
- Monitor error rates
- Track performance metrics

### Troubleshooting

#### Common Issues

**Issue**: Out of memory errors
- **Solution**: Reduce max_workers, implement request queuing

**Issue**: Slow responses
- **Solution**: Enable aggressive caching, use streaming

**Issue**: Model loading failures
- **Solution**: Use HF Inference API instead of local models

**Issue**: Session data loss
- **Solution**: Implement proper persistence with SQLite backup

### Scaling Considerations

#### For Production
1. **Horizontal Scaling**: Deploy multiple instances
2. **Caching Layer**: Add Redis for shared session data
3. **Load Balancing**: Use HF Spaces built-in load balancer
4. **CDN**: Static assets via CDN
5. **Database**: Consider PostgreSQL for production

#### Migration Path
- **Phase 1**: MVP on ZeroGPU (current)
- **Phase 2**: Upgrade to GPU for local models
- **Phase 3**: Scale to multiple workers
- **Phase 4**: Enterprise deployment with managed infrastructure