AI-Life-Coach-Streamlit2

Running

App Files Files Community

rdune71 commited on Sep 7

Commit

fb8e0ac

1 Parent(s): 5470458

Update README with comprehensive endpoint information and enhance Hugging Face fallback UX

Browse files

Files changed (2) hide show

README.md +99 -29
app.py +7 -2

README.md CHANGED Viewed

@@ -20,9 +20,7 @@ Your personal AI-powered life coaching assistant.
 - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
 - Dynamic model selection
 - Remote Ollama integration via ngrok
-- Session management
-- Weather information integration
-- Text-to-speech capabilities
 ## How to Use
@@ -33,42 +31,114 @@ Your personal AI-powered life coaching assistant.
 ## Requirements
-All requirements are specified in . The app automatically handles:
 - Streamlit UI
 - FastAPI backend (for future expansion)
 - Redis connection for persistent memory
 - Multiple LLM integrations
-- Weather service integration
-- Text-to-speech capabilities
 ## Environment Variables
-Configure these in your Hugging Face Space secrets or local  file:
-- : Your Ollama server URL (default: ngrok URL)
-- : Default model name (default: mistral)
-- : Hugging Face API token (for Hugging Face models and TTS)
-- : Hugging Face inference API endpoint
-- : Whether to use fallback providers (true/false)
-- : Redis server hostname (default: localhost)
-- : Redis server port (default: 6379)
-- : Redis username (optional)
-- : Redis password (optional)
-- : OpenWeather API key (for weather features)
-- : Tavily API key (for web search)
-## Architecture
 This application consists of:
-- Streamlit frontend ()
-- Core LLM abstraction ()
-- Memory management ()
-- Session management ()
-- Configuration management ()
-- API endpoints (in  directory for future expansion)
-- Services ( directory):
-  - Weather service ()
-  - Text-to-speech service ()
-  - Ollama monitor ()
 Built with Python, Streamlit, FastAPI, and Redis.

 - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
 - Dynamic model selection
 - Remote Ollama integration via ngrok
+- Automatic fallback between providers
 ## How to Use
 ## Requirements
+All requirements are specified in `requirements.txt`. The app automatically handles:
 - Streamlit UI
 - FastAPI backend (for future expansion)
 - Redis connection for persistent memory
 - Multiple LLM integrations
 ## Environment Variables
+Configure these in your Hugging Face Space secrets or local `.env` file:
+- `OLLAMA_HOST`: Your Ollama server URL (default: ngrok URL)
+- `LOCAL_MODEL_NAME`: Default model name (default: mistral)
+- `HF_TOKEN`: Hugging Face API token (for Hugging Face models)
+- `HF_API_ENDPOINT_URL`: Hugging Face inference API endpoint
+- `USE_FALLBACK`: Whether to use fallback providers (true/false)
+- `REDIS_HOST`: Redis server hostname (default: localhost)
+- `REDIS_PORT`: Redis server port (default: 6379)
+- `REDIS_USERNAME`: Redis username (optional)
+- `REDIS_PASSWORD`: Redis password (optional)
+## Provider Details
+### Ollama (Primary Local Provider)
+**Setup:**
+1. Install Ollama: https://ollama.com/download
+2. Pull a model: `ollama pull mistral`
+3. Start server: `ollama serve`
+4. Configure ngrok: `ngrok http 11434`
+5. Set `OLLAMA_HOST` to your ngrok URL
+**Advantages:**
+- No cost for inference
+- Full control over models
+- Fast response times
+- Privacy - all processing local
+### Hugging Face Inference API (Fallback)
+**Current Endpoint:** `https://zxzbfrlg3ssrk7d9.us-east-1.aws.endpoints.huggingface.cloud`
+**Important Scaling Behavior:**
+- ⚠️ **Scale-to-Zero**: Endpoint automatically scales to zero after 15 minutes of inactivity
+- ⏱️ **Cold Start**: Takes approximately 4 minutes to initialize when first requested
+- 🔄 **Automatic Wake-up**: Sending any request will automatically start the endpoint
+- 💰 **Cost**: \$0.536/hour while running (not billed when scaled to zero)
+- 📍 **Location**: AWS us-east-1 (Intel Sapphire Rapids, 16vCPUs, 32GB RAM)
+**Handling 503 Errors:**
+When using the Hugging Face fallback, you may encounter 503 errors initially. This indicates the endpoint is initializing. Simply retry your request after 30-60 seconds, or wait for the initialization to complete (typically 4 minutes).
+**Model:** OpenAI GPT OSS 20B (Uncensored variant)
+### OpenAI (Alternative Fallback)
+Configure with `OPENAI_API_KEY` environment variable.
+## Switching Between Providers
+### For Local Development (Windows/Ollama):
+1. **Install Ollama:**
+   ```bash
+   # Download from https://ollama.com/download/OllamaSetup.exe
+Pull and run models:
+ollama pull mistral
+ollama pull llama3
+ollama serve
+Start ngrok tunnel:
+ngrok http 11434
+Update environment variables:
+OLLAMA_HOST=https://your-ngrok-url.ngrok-free.app
+LOCAL_MODEL_NAME=mistral
+USE_FALLBACK=false
+For Production Deployment:
+The application automatically handles provider fallback:
+Primary: Ollama (via ngrok)
+Secondary: Hugging Face Inference API
+Tertiary: OpenAI (if configured)
+Architecture
 This application consists of:
+Streamlit frontend (app.py)
+Core LLM abstraction (core/llm.py)
+Memory management (core/memory.py)
+Configuration management (utils/config.py)
+API endpoints (in api/ directory for future expansion)
 Built with Python, Streamlit, FastAPI, and Redis.
+Troubleshooting Common Issues:
+503 Errors with Hugging Face Fallback:
+Wait 4 minutes for cold start initialization
+Retry request after endpoint warms up
+Ollama Connection Issues:
+Verify ollama serve is running locally
+Check ngrok tunnel status
+Confirm ngrok URL matches OLLAMA_HOST
+Test with test_ollama_connection.py
+Redis Connection Problems:
+Set USE_FALLBACK=true to disable Redis requirement
+Or configure proper Redis credentials
+Model Not Found:
+Pull required model: ollama pull <model-name>
+Check available models: ollama list
+Diagnostic Scripts:
+Run python test_ollama_connection.py to verify Ollama connectivity.
+Run python diagnose_ollama.py for detailed connection diagnostics.

app.py CHANGED Viewed

@@ -1,4 +1,4 @@
-# Force redeploy trigger - version 2.0
 import streamlit as st
 from utils.config import config
 import requests
@@ -200,9 +200,14 @@ ollama_status = ollama_status or {}
 # Determine if we should use fallback
 use_fallback = not ollama_status.get("running", False) or config.use_fallback
-# Display Ollama status
 if use_fallback:
     st.sidebar.warning("🌐 Using Hugging Face fallback (Ollama not available)")
     if "error" in ollama_status:
         st.sidebar.caption(f"Error: {ollama_status['error'][:50]}...")
 else:

+# Force redeploy trigger - version 2.1
 import streamlit as st
 from utils.config import config
 import requests
 # Determine if we should use fallback
 use_fallback = not ollama_status.get("running", False) or config.use_fallback
+# Display Ollama status - Enhanced section with Hugging Face scaling behavior info
 if use_fallback:
     st.sidebar.warning("🌐 Using Hugging Face fallback (Ollama not available)")
+    # Add special note for Hugging Face scaling behavior
+    if config.hf_api_url and "endpoints.huggingface.cloud" in config.hf_api_url:
+        st.sidebar.info("ℹ️ HF Endpoint may be initializing (up to 4 min)")
     if "error" in ollama_status:
         st.sidebar.caption(f"Error: {ollama_status['error'][:50]}...")
 else: