rdune71 commited on
Commit
fb8e0ac
·
1 Parent(s): 5470458

Update README with comprehensive endpoint information and enhance Hugging Face fallback UX

Browse files
Files changed (2) hide show
  1. README.md +99 -29
  2. app.py +7 -2
README.md CHANGED
@@ -20,9 +20,7 @@ Your personal AI-powered life coaching assistant.
20
  - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
21
  - Dynamic model selection
22
  - Remote Ollama integration via ngrok
23
- - Session management
24
- - Weather information integration
25
- - Text-to-speech capabilities
26
 
27
  ## How to Use
28
 
@@ -33,42 +31,114 @@ Your personal AI-powered life coaching assistant.
33
 
34
  ## Requirements
35
 
36
- All requirements are specified in . The app automatically handles:
37
  - Streamlit UI
38
  - FastAPI backend (for future expansion)
39
  - Redis connection for persistent memory
40
  - Multiple LLM integrations
41
- - Weather service integration
42
- - Text-to-speech capabilities
43
 
44
  ## Environment Variables
45
 
46
- Configure these in your Hugging Face Space secrets or local file:
47
 
48
- - : Your Ollama server URL (default: ngrok URL)
49
- - : Default model name (default: mistral)
50
- - : Hugging Face API token (for Hugging Face models and TTS)
51
- - : Hugging Face inference API endpoint
52
- - : Whether to use fallback providers (true/false)
53
- - : Redis server hostname (default: localhost)
54
- - : Redis server port (default: 6379)
55
- - : Redis username (optional)
56
- - : Redis password (optional)
57
- - : OpenWeather API key (for weather features)
58
- - : Tavily API key (for web search)
59
 
60
- ## Architecture
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  This application consists of:
63
- - Streamlit frontend ()
64
- - Core LLM abstraction ()
65
- - Memory management ()
66
- - Session management ()
67
- - Configuration management ()
68
- - API endpoints (in directory for future expansion)
69
- - Services ( directory):
70
- - Weather service ()
71
- - Text-to-speech service ()
72
- - Ollama monitor ()
73
 
 
 
 
 
 
74
  Built with Python, Streamlit, FastAPI, and Redis.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
21
  - Dynamic model selection
22
  - Remote Ollama integration via ngrok
23
+ - Automatic fallback between providers
 
 
24
 
25
  ## How to Use
26
 
 
31
 
32
  ## Requirements
33
 
34
+ All requirements are specified in `requirements.txt`. The app automatically handles:
35
  - Streamlit UI
36
  - FastAPI backend (for future expansion)
37
  - Redis connection for persistent memory
38
  - Multiple LLM integrations
 
 
39
 
40
  ## Environment Variables
41
 
42
+ Configure these in your Hugging Face Space secrets or local `.env` file:
43
 
44
+ - `OLLAMA_HOST`: Your Ollama server URL (default: ngrok URL)
45
+ - `LOCAL_MODEL_NAME`: Default model name (default: mistral)
46
+ - `HF_TOKEN`: Hugging Face API token (for Hugging Face models)
47
+ - `HF_API_ENDPOINT_URL`: Hugging Face inference API endpoint
48
+ - `USE_FALLBACK`: Whether to use fallback providers (true/false)
49
+ - `REDIS_HOST`: Redis server hostname (default: localhost)
50
+ - `REDIS_PORT`: Redis server port (default: 6379)
51
+ - `REDIS_USERNAME`: Redis username (optional)
52
+ - `REDIS_PASSWORD`: Redis password (optional)
 
 
53
 
54
+ ## Provider Details
55
 
56
+ ### Ollama (Primary Local Provider)
57
+
58
+ **Setup:**
59
+ 1. Install Ollama: https://ollama.com/download
60
+ 2. Pull a model: `ollama pull mistral`
61
+ 3. Start server: `ollama serve`
62
+ 4. Configure ngrok: `ngrok http 11434`
63
+ 5. Set `OLLAMA_HOST` to your ngrok URL
64
+
65
+ **Advantages:**
66
+ - No cost for inference
67
+ - Full control over models
68
+ - Fast response times
69
+ - Privacy - all processing local
70
+
71
+ ### Hugging Face Inference API (Fallback)
72
+
73
+ **Current Endpoint:** `https://zxzbfrlg3ssrk7d9.us-east-1.aws.endpoints.huggingface.cloud`
74
+
75
+ **Important Scaling Behavior:**
76
+ - ⚠️ **Scale-to-Zero**: Endpoint automatically scales to zero after 15 minutes of inactivity
77
+ - ⏱️ **Cold Start**: Takes approximately 4 minutes to initialize when first requested
78
+ - 🔄 **Automatic Wake-up**: Sending any request will automatically start the endpoint
79
+ - 💰 **Cost**: \$0.536/hour while running (not billed when scaled to zero)
80
+ - 📍 **Location**: AWS us-east-1 (Intel Sapphire Rapids, 16vCPUs, 32GB RAM)
81
+
82
+ **Handling 503 Errors:**
83
+ When using the Hugging Face fallback, you may encounter 503 errors initially. This indicates the endpoint is initializing. Simply retry your request after 30-60 seconds, or wait for the initialization to complete (typically 4 minutes).
84
+
85
+ **Model:** OpenAI GPT OSS 20B (Uncensored variant)
86
+
87
+ ### OpenAI (Alternative Fallback)
88
+
89
+ Configure with `OPENAI_API_KEY` environment variable.
90
+
91
+ ## Switching Between Providers
92
+
93
+ ### For Local Development (Windows/Ollama):
94
+
95
+ 1. **Install Ollama:**
96
+ ```bash
97
+ # Download from https://ollama.com/download/OllamaSetup.exe
98
+ Pull and run models:
99
+
100
+ ollama pull mistral
101
+ ollama pull llama3
102
+ ollama serve
103
+ Start ngrok tunnel:
104
+
105
+ ngrok http 11434
106
+ Update environment variables:
107
+
108
+ OLLAMA_HOST=https://your-ngrok-url.ngrok-free.app
109
+ LOCAL_MODEL_NAME=mistral
110
+ USE_FALLBACK=false
111
+ For Production Deployment:
112
+ The application automatically handles provider fallback:
113
+
114
+ Primary: Ollama (via ngrok)
115
+ Secondary: Hugging Face Inference API
116
+ Tertiary: OpenAI (if configured)
117
+ Architecture
118
  This application consists of:
 
 
 
 
 
 
 
 
 
 
119
 
120
+ Streamlit frontend (app.py)
121
+ Core LLM abstraction (core/llm.py)
122
+ Memory management (core/memory.py)
123
+ Configuration management (utils/config.py)
124
+ API endpoints (in api/ directory for future expansion)
125
  Built with Python, Streamlit, FastAPI, and Redis.
126
+
127
+ Troubleshooting Common Issues:
128
+ 503 Errors with Hugging Face Fallback:
129
+ Wait 4 minutes for cold start initialization
130
+ Retry request after endpoint warms up
131
+ Ollama Connection Issues:
132
+ Verify ollama serve is running locally
133
+ Check ngrok tunnel status
134
+ Confirm ngrok URL matches OLLAMA_HOST
135
+ Test with test_ollama_connection.py
136
+ Redis Connection Problems:
137
+ Set USE_FALLBACK=true to disable Redis requirement
138
+ Or configure proper Redis credentials
139
+ Model Not Found:
140
+ Pull required model: ollama pull <model-name>
141
+ Check available models: ollama list
142
+ Diagnostic Scripts:
143
+ Run python test_ollama_connection.py to verify Ollama connectivity.
144
+ Run python diagnose_ollama.py for detailed connection diagnostics.
app.py CHANGED
@@ -1,4 +1,4 @@
1
- # Force redeploy trigger - version 2.0
2
  import streamlit as st
3
  from utils.config import config
4
  import requests
@@ -200,9 +200,14 @@ ollama_status = ollama_status or {}
200
  # Determine if we should use fallback
201
  use_fallback = not ollama_status.get("running", False) or config.use_fallback
202
 
203
- # Display Ollama status
204
  if use_fallback:
205
  st.sidebar.warning("🌐 Using Hugging Face fallback (Ollama not available)")
 
 
 
 
 
206
  if "error" in ollama_status:
207
  st.sidebar.caption(f"Error: {ollama_status['error'][:50]}...")
208
  else:
 
1
+ # Force redeploy trigger - version 2.1
2
  import streamlit as st
3
  from utils.config import config
4
  import requests
 
200
  # Determine if we should use fallback
201
  use_fallback = not ollama_status.get("running", False) or config.use_fallback
202
 
203
+ # Display Ollama status - Enhanced section with Hugging Face scaling behavior info
204
  if use_fallback:
205
  st.sidebar.warning("🌐 Using Hugging Face fallback (Ollama not available)")
206
+
207
+ # Add special note for Hugging Face scaling behavior
208
+ if config.hf_api_url and "endpoints.huggingface.cloud" in config.hf_api_url:
209
+ st.sidebar.info("ℹ️ HF Endpoint may be initializing (up to 4 min)")
210
+
211
  if "error" in ollama_status:
212
  st.sidebar.caption(f"Error: {ollama_status['error'][:50]}...")
213
  else: