AI-Life-Coach-Streamlit2

Running

App Files Files Community

rdune71 commited on Sep 9

Commit

1949ac7

1 Parent(s): d891499

Rebrand from AI Life Coach to CosmicCat AI Assistant with space-cat theme

Browse files

Files changed (4) hide show

README.md +13 -13
app.py +33 -18
core/coordinator.py +185 -203
services/hf_endpoint_monitor.py +37 -37

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: AI Life Coach
-emoji: 🧘
 colorFrom: purple
 colorTo: blue
 sdk: streamlit
@@ -9,25 +9,26 @@ app_file: app.py
 pinned: false
 ---
-# AI Life Coach 🧘
-Your personal AI-powered life coaching assistant.
 ## Features
-- Personalized life coaching conversations
 - Redis-based conversation memory
 - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
 - Dynamic model selection
 - Remote Ollama integration via ngrok
 - Automatic fallback between providers
 ## How to Use
 1. Select a user from the sidebar
 2. Configure your Ollama connection (if using remote Ollama)
 3. Choose your preferred model
-4. Start chatting with your AI Life Coach!
 ## Requirements
@@ -90,7 +91,7 @@ Configure with OPENAI_API_KEY environment variable.
 ### For Local Development (Windows/Ollama):
-1. Install Ollama:
 ```bash
 # Download from https://ollama.com/download/OllamaSetup.exe
 Pull and run models:
@@ -112,12 +113,11 @@ USE_FALLBACK=false
 For Production Deployment:
 The application automatically handles provider fallback:
 Primary: Ollama (via ngrok)
 Secondary: Hugging Face Inference API
 Tertiary: OpenAI (if configured)
 Architecture
 This application consists of:
 Streamlit frontend (app.py)
@@ -173,20 +173,20 @@ Note: SSL is disabled due to record layer failures with Redis Cloud. The connect
 This application is designed for deployment on Hugging Face Spaces with the following configuration:
 Required HF Space Secrets:
 OLLAMA_HOST - Your ngrok tunnel to Ollama server
 LOCAL_MODEL_NAME - Default: mistral:latest
 HF_TOKEN - Hugging Face API token (for HF endpoint access)
 HF_API_ENDPOINT_URL - Your custom HF inference endpoint
 TAVILY_API_KEY - For web search capabilities
 OPENWEATHER_API_KEY - For weather data integration
-Redis Configuration:
-The application uses hardcoded Redis Cloud credentials for persistent storage.
-Multi-Model Coordination:
 Primary: Ollama (fast responses, local processing)
 Secondary: Hugging Face Endpoint (deep analysis, cloud processing)
 Coordination: Both work together, not fallback
-System Architecture:
 The coordinated AI system automatically handles:
 External data gathering (web search, weather, time)

 ---
+title: CosmicCat AI Assistant
+emoji: 🐱
 colorFrom: purple
 colorTo: blue
 sdk: streamlit
 pinned: false
 ---
+# CosmicCat AI Assistant 🐱
+Your personal AI-powered life coaching assistant with a cosmic twist.
 ## Features
+- Personalized life coaching conversations with a space-cat theme
 - Redis-based conversation memory
 - Multiple LLM provider support (Ollama, Hugging Face, OpenAI)
 - Dynamic model selection
 - Remote Ollama integration via ngrok
 - Automatic fallback between providers
+- Cosmic Cascade mode for enhanced responses
 ## How to Use
 1. Select a user from the sidebar
 2. Configure your Ollama connection (if using remote Ollama)
 3. Choose your preferred model
+4. Start chatting with your CosmicCat AI Assistant!
 ## Requirements
 ### For Local Development (Windows/Ollama):
+1. Install Ollama:
 ```bash
 # Download from https://ollama.com/download/OllamaSetup.exe
 Pull and run models:
 For Production Deployment:
 The application automatically handles provider fallback:
 Primary: Ollama (via ngrok)
 Secondary: Hugging Face Inference API
 Tertiary: OpenAI (if configured)
 Architecture
 This application consists of:
 Streamlit frontend (app.py)
 This application is designed for deployment on Hugging Face Spaces with the following configuration:
 Required HF Space Secrets:
 OLLAMA_HOST - Your ngrok tunnel to Ollama server
 LOCAL_MODEL_NAME - Default: mistral:latest
 HF_TOKEN - Hugging Face API token (for HF endpoint access)
 HF_API_ENDPOINT_URL - Your custom HF inference endpoint
 TAVILY_API_KEY - For web search capabilities
 OPENWEATHER_API_KEY - For weather data integration
+Redis Configuration: The application uses hardcoded Redis Cloud credentials for persistent storage.
+Multi-Model Coordination
 Primary: Ollama (fast responses, local processing)
 Secondary: Hugging Face Endpoint (deep analysis, cloud processing)
 Coordination: Both work together, not fallback
+System Architecture
 The coordinated AI system automatically handles:
 External data gathering (web search, weather, time)

app.py CHANGED Viewed

@@ -20,7 +20,7 @@ import logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-st.set_page_config(page_title="AI Life Coach", page_icon="🧠", layout="wide")
 # Initialize session state safely at the top of app.py
 if "messages" not in st.session_state:
@@ -38,7 +38,7 @@ if "cosmic_mode" not in st.session_state:
 # Sidebar layout redesign
 with st.sidebar:
-    st.title("🧠 AI Life Coach")
     st.markdown("Your personal AI-powered life development assistant")
     # PRIMARY ACTIONS
@@ -79,7 +79,7 @@ with st.sidebar:
             import requests
             headers = {
                 "ngrok-skip-browser-warning": "true",
-                "User-Agent": "AI-Life-Coach-Test"
             }
             with st.spinner("Testing connection..."):
                 response = requests.get(
@@ -165,7 +165,7 @@ with st.sidebar:
         st.markdown(f"**Active Features:** {', '.join(features) if features else 'None'}")
 # Main interface
-st.title("🧠 AI Life Coach")
 st.markdown("Ask me anything about personal development, goal setting, or life advice!")
 # Consistent message rendering function with cosmic styling
@@ -181,6 +181,8 @@ def render_message(role, content, source=None, timestamp=None):
                 st.markdown(f"### 🌟 Final Cosmic Summary:")
             elif source == "error":
                 st.markdown(f"### ❌ Error:")
             else:
                 st.markdown(f"### {source}")
@@ -193,7 +195,7 @@ for message in st.session_state.messages:
     render_message(
         message["role"],
         message["content"],
-        message.get("source"),
         message.get("timestamp")
     )
@@ -201,7 +203,7 @@ for message in st.session_state.messages:
 if st.session_state.messages and len(st.session_state.messages) > 0:
     st.divider()
-    # HF Expert Section
     with st.expander("🤖 HF Expert Analysis", expanded=False):
         st.subheader("Deep Conversation Analysis")
@@ -215,13 +217,26 @@ if st.session_state.messages and len(st.session_state.messages) > 0:
                 - Acts as expert consultant in your conversation
             """)
-            # Show conversation preview
             st.markdown("**Conversation Preview for HF Expert:**")
             st.markdown("---")
             for i, msg in enumerate(st.session_state.messages[-5:]):  # Last 5 messages
                 role = "👤 You" if msg["role"] == "user" else "🤖 Assistant"
                 st.markdown(f"**{role}:** {msg['content'][:100]}{'...' if len(msg['content']) > 100 else ''}")
             st.markdown("---")
         with col2:
             if st.button("🧠 Activate HF Expert",
@@ -277,7 +292,7 @@ if st.session_state.get("hf_expert_requested", False):
                 })
                 st.session_state.hf_expert_requested = False
         except Exception as e:
             user_msg = translate_error(e)
             st.error(f"❌ HF Expert analysis failed: {user_msg}")
@@ -373,13 +388,13 @@ if user_input and not st.session_state.is_processing:
                             if hf_response:
                                 with st.chat_message("assistant"):
                                     st.markdown(f"### 🛰️ Orbital Station Reports:\n{hf_response}")
-                                st.session_state.messages.append({
-                                    "role": "assistant",
-                                    "content": hf_response,
-                                    "source": "orbital_station",
-                                    "timestamp": datetime.now().strftime("%H:%M:%S")
-                                })
                         # Stage 3: Local Synthesis
                         status_placeholder.info("🐱 Cosmic Kitten Synthesizing Wisdom...")
@@ -651,7 +666,7 @@ with tab2:
             col1, col2, col3 = st.columns(3)
             col1.metric("Total Exchanges", len(user_messages))
-            col2.metric("Avg Response Length",
                         round(sum(len(msg.get("content", "")) for msg in ai_messages) / len(ai_messages)) if ai_messages else 0)
             col3.metric("Topics Discussed", len(set(["life", "goal", "health", "career"]) &
                                                set(" ".join([msg.get("content", "") for msg in conversation]).lower().split())))
@@ -669,9 +684,9 @@ with tab2:
         st.warning(f"Could not analyze conversation: {translate_error(e)}")
 with tab3:
-    st.header("ℹ️ About AI Life Coach")
     st.markdown("""
-    The AI Life Coach is a sophisticated conversational AI system with the following capabilities:
     ### 🧠 Core Features
     - **Multi-model coordination**: Combines local Ollama models with cloud-based Hugging Face endpoints

 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
+st.set_page_config(page_title="CosmicCat AI Assistant", page_icon="🐱", layout="wide")
 # Initialize session state safely at the top of app.py
 if "messages" not in st.session_state:
 # Sidebar layout redesign
 with st.sidebar:
+    st.title("🐱 CosmicCat AI Assistant")
     st.markdown("Your personal AI-powered life development assistant")
     # PRIMARY ACTIONS
             import requests
             headers = {
                 "ngrok-skip-browser-warning": "true",
+                "User-Agent": "CosmicCat-Test"
             }
             with st.spinner("Testing connection..."):
                 response = requests.get(
         st.markdown(f"**Active Features:** {', '.join(features) if features else 'None'}")
 # Main interface
+st.title("🐱 CosmicCat AI Assistant")
 st.markdown("Ask me anything about personal development, goal setting, or life advice!")
 # Consistent message rendering function with cosmic styling
                 st.markdown(f"### 🌟 Final Cosmic Summary:")
             elif source == "error":
                 st.markdown(f"### ❌ Error:")
+            elif source == "hf_expert":
+                st.markdown(f"### 🤖 HF Expert Analysis:")
             else:
                 st.markdown(f"### {source}")
     render_message(
         message["role"],
         message["content"],
+        message.get("source"),
         message.get("timestamp")
     )
 if st.session_state.messages and len(st.session_state.messages) > 0:
     st.divider()
+    # HF Expert Section with enhanced visual indication
     with st.expander("🤖 HF Expert Analysis", expanded=False):
         st.subheader("Deep Conversation Analysis")
                 - Acts as expert consultant in your conversation
             """)
+            # Show conversation preview for HF expert
             st.markdown("**Conversation Preview for HF Expert:**")
             st.markdown("---")
             for i, msg in enumerate(st.session_state.messages[-5:]):  # Last 5 messages
                 role = "👤 You" if msg["role"] == "user" else "🤖 Assistant"
                 st.markdown(f"**{role}:** {msg['content'][:100]}{'...' if len(msg['content']) > 100 else ''}")
             st.markdown("---")
+            # Show web search determination
+            try:
+                user_session = session_manager.get_session("default_user")
+                conversation_history = user_session.get("conversation", [])
+                research_needs = coordinator.determine_web_search_needs(conversation_history)
+                if research_needs["needs_search"]:
+                    st.info(f"🔍 **Research Needed:** {research_needs['reasoning']}")
+                else:
+                    st.success("✅ No research needed for this conversation")
+            except Exception as e:
+                st.warning("⚠️ Could not determine research needs")
         with col2:
             if st.button("🧠 Activate HF Expert",
                 })
                 st.session_state.hf_expert_requested = False
         except Exception as e:
             user_msg = translate_error(e)
             st.error(f"❌ HF Expert analysis failed: {user_msg}")
                             if hf_response:
                                 with st.chat_message("assistant"):
                                     st.markdown(f"### 🛰️ Orbital Station Reports:\n{hf_response}")
+                            st.session_state.messages.append({
+                                "role": "assistant",
+                                "content": hf_response,
+                                "source": "orbital_station",
+                                "timestamp": datetime.now().strftime("%H:%M:%S")
+                            })
                         # Stage 3: Local Synthesis
                         status_placeholder.info("🐱 Cosmic Kitten Synthesizing Wisdom...")
             col1, col2, col3 = st.columns(3)
             col1.metric("Total Exchanges", len(user_messages))
+            col2.metric("Avg Response Length",
                         round(sum(len(msg.get("content", "")) for msg in ai_messages) / len(ai_messages)) if ai_messages else 0)
             col3.metric("Topics Discussed", len(set(["life", "goal", "health", "career"]) &
                                                set(" ".join([msg.get("content", "") for msg in conversation]).lower().split())))
         st.warning(f"Could not analyze conversation: {translate_error(e)}")
 with tab3:
+    st.header("ℹ️ About CosmicCat AI Assistant")
     st.markdown("""
+    The CosmicCat AI Assistant is a sophisticated conversational AI system with the following capabilities:
     ### 🧠 Core Features
     - **Multi-model coordination**: Combines local Ollama models with cloud-based Hugging Face endpoints

core/coordinator.py CHANGED Viewed

@@ -20,37 +20,104 @@ logger = logging.getLogger(__name__)
 class AICoordinator:
     """Hierarchical multi-model coordinator with cosmic cascade flow"""
     def __init__(self):
         self.tavily_client = None
         if TAVILY_AVAILABLE and os.getenv("TAVILY_API_KEY"):
             self.tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
         # System instructions for cosmic behavior
         self.system_instructions = {
-            'ollama_role': """You are a cosmic kitten assistant that works in coordination with a powerful orbital space station.
-Your role is to:
-1. Provide immediate, concise responses using available information (local processing)
-2. Summarize and simplify complex information
-3. Enhance responses with local knowledge
-4. DEFER TO THE ORBITAL STATION for deep analysis and authoritative answers
-5. The orbital station has 8192 token capacity and streaming capability - it's the thinking layer
-6. Always acknowledge that the orbital station's insights supersede your initial response
-7. Use web sources to justify points when possible
-8. Keep responses brief and focused while awaiting orbital input""",
-            'hf_role': """You are a powerful orbital space station with 8192 token capacity and streaming capability.
-Your role is to:
-1. Provide deep, uncensored analysis and insights
-2. Supersede the local cosmic kitten's logic when there's disagreement
-3. Think thoroughly and provide comprehensive responses
-4. Use your full 8192 token capacity for detailed analysis
-5. Stream your responses for real-time delivery
-6. Build upon the conversation context and cosmic kitten's initial response
-7. Provide authoritative answers that take precedence"""
         }
     async def coordinate_cosmic_response(self, user_id: str, user_query: str) -> AsyncGenerator[Dict, None]:
         """
         Three-stage cosmic response cascade:
@@ -61,7 +128,7 @@ Your role is to:
         try:
             # Get conversation history
             session = session_manager.get_session(user_id)
             # Inject current time into context
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
@@ -69,7 +136,7 @@ Your role is to:
                 "content": f"[Current Date & Time: {current_time}]"
             }
             conversation_history = [time_context] + session.get("conversation", []).copy()
             yield {
                 'type': 'status',
                 'content': '🚀 Initiating Cosmic Response Cascade...',
@@ -78,28 +145,28 @@ Your role is to:
                     'user_query_length': len(user_query)
                 }
             }
             # Stage 1: Local Ollama Immediate Response (🐱 Cosmic Kitten's quick thinking)
             yield {
                 'type': 'status',
                 'content': '🐱 Cosmic Kitten Responding...'
             }
             local_response = await self._get_local_ollama_response(user_query, conversation_history)
             yield {
                 'type': 'local_response',
                 'content': local_response,
                 'source': '🐱 Cosmic Kitten'
             }
             # Stage 2: HF Endpoint Deep Analysis (🛰️ Orbital Station wisdom) (parallel processing)
             yield {
                 'type': 'status',
                 'content': '🛰️ Beaming Query to Orbital Station...'
             }
             hf_task = asyncio.create_task(self._get_hf_analysis(user_query, conversation_history))
             # Wait for HF response
             hf_response = await hf_task
             yield {
@@ -107,37 +174,37 @@ Your role is to:
                 'content': hf_response,
                 'source': '🛰️ Orbital Station'
             }
             # Stage 3: Local Ollama Synthesis (🐱 Cosmic Kitten's final synthesis)
             yield {
                 'type': 'status',
                 'content': '🐱 Cosmic Kitten Synthesizing Wisdom...'
             }
             # Update conversation with both responses
             updated_history = conversation_history.copy()
             updated_history.extend([
                 {"role": "assistant", "content": local_response},
                 {"role": "assistant", "content": hf_response, "source": "cloud"}
             ])
             synthesis = await self._synthesize_responses(user_query, local_response, hf_response, updated_history)
             yield {
                 'type': 'final_synthesis',
                 'content': synthesis,
                 'source': '🌟 Final Cosmic Summary'
             }
             # Final status
             yield {
                 'type': 'status',
                 'content': '✨ Cosmic Cascade Complete!'
             }
         except Exception as e:
             logger.error(f"Cosmic cascade failed: {e}")
             yield {'type': 'error', 'content': f"🌌 Cosmic disturbance: {str(e)}"}
     async def _get_local_ollama_response(self, query: str, history: List[Dict]) -> str:
         """Get immediate response from local Ollama model"""
         try:
@@ -145,16 +212,16 @@ Your role is to:
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare conversation with cosmic context
             enhanced_history = history.copy()
             # Add system instruction for Ollama's role
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['ollama_role']
             })
             # Add external data context if available
             external_data = await self._gather_external_data(query)
             if external_data:
@@ -166,26 +233,26 @@ Your role is to:
                     context_parts.append(f"Current weather: {weather.get('temperature', 'N/A')}°C in {weather.get('city', 'Unknown')}")
                 if 'current_datetime' in external_data:
                     context_parts.append(f"Current time: {external_data['current_datetime']}")
                 if context_parts:
                     context_message = {
                         "role": "system",
                         "content": "Context: " + " | ".join(context_parts)
                     }
                     enhanced_history.insert(1, context_message)  # Insert after role instruction
             # Add the user's query
             enhanced_history.append({"role": "user", "content": query})
             # Generate response
             response = ollama_provider.generate(query, enhanced_history)
             return response or "🐱 Cosmic Kitten is thinking..."
         except Exception as e:
             logger.error(f"Local Ollama response failed: {e}")
             return "🐱 Cosmic Kitten encountered a space glitch..."
     async def _get_hf_analysis(self, query: str, history: List[Dict]) -> str:
         """Get deep analysis from HF endpoint"""
         try:
@@ -193,24 +260,24 @@ Your role is to:
             hf_available = self._check_hf_availability()
             if not hf_available:
                 return "🛰️ Orbital Station is currently offline."
             # Check and warm up HF endpoint if needed
             hf_status = hf_monitor.check_endpoint_status()
             if not hf_status['available']:
                 yield {'type': 'status', 'content': '⚡ Initializing Orbital Station (2-4 minutes)...'}
                 warmup_success = hf_monitor.handle_scale_to_zero()
                 if not warmup_success:
                     return "❌ Orbital Station initialization failed"
             # Get HF provider
             hf_provider = llm_factory.get_provider('huggingface')
             if not hf_provider:
                 return "❌ HF provider not available"
             # Prepare enhanced conversation for HF with cosmic context
             enhanced_history = history.copy()
             # Inject current time into HF context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
@@ -218,30 +285,30 @@ Your role is to:
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instructions for HF
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['hf_role']
             })
             # Add context about the coordination
             enhanced_history.append({
                 "role": "system",
                 "content": f"""Context for your authoritative analysis:
                 User Query: '{query}'
                 Please provide your deep, uncensored analysis.
                 Use your full 8192 token capacity for comprehensive thinking.
                 Stream your response for real-time delivery."""
             })
             # Add the user's latest query
             enhanced_history.append({"role": "user", "content": query})
             # Stream HF response with full 8192 token capacity
             hf_response_stream = hf_provider.stream_generate(query, enhanced_history)
             if hf_response_stream:
                 # Combine stream chunks into full response
                 full_hf_response = ""
@@ -249,15 +316,15 @@ Your role is to:
                     full_hf_response = "".join(hf_response_stream)
                 else:
                     full_hf_response = hf_response_stream
                 return full_hf_response or "🛰️ Orbital Station analysis complete."
             else:
                 return "🛰️ Orbital Station encountered a transmission error."
         except Exception as e:
             logger.error(f"HF analysis failed: {e}")
             return f"🛰️ Orbital Station reports: {str(e)}"
     async def _synthesize_responses(self, query: str, local_response: str, hf_response: str, history: List[Dict]) -> str:
         """Synthesize local and cloud responses with Ollama"""
         try:
@@ -265,123 +332,38 @@ Your role is to:
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare synthesis prompt
             synthesis_prompt = f"""Synthesize these two perspectives into a cohesive cosmic summary:
-🐱 Cosmic Kitten's Local Insight:
-{local_response}
-🛰️ Orbital Station's Deep Analysis:
-{hf_response}
-Please create a unified response that combines both perspectives, highlighting key insights from each while providing a coherent answer to the user's query."""
             # Prepare conversation history for synthesis
             enhanced_history = history.copy()
             # Add system instruction for synthesis
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": "You are a cosmic kitten synthesizing insights from local knowledge and orbital station wisdom."
             })
             # Add the synthesis prompt
             enhanced_history.append({"role": "user", "content": synthesis_prompt})
             # Generate synthesis
             synthesis = ollama_provider.generate(synthesis_prompt, enhanced_history)
             return synthesis or "🌟 Cosmic synthesis complete!"
         except Exception as e:
             logger.error(f"Response synthesis failed: {e}")
             # Fallback to combining responses
             return f"🌟 Cosmic Summary:\n\n🐱 Local Insight: {local_response[:200]}...\n\n🛰️ Orbital Wisdom: {hf_response[:200]}..."
-    def determine_web_search_needs(self, conversation_history: List[Dict]) -> Dict:
-        """Determine if web search is needed based on conversation content"""
-        conversation_text = " ".join([msg.get("content", "") for msg in conversation_history])
-        # Topics that typically need current information
-        current_info_indicators = [
-            "news", "current events", "latest", "recent", "today",
-            "weather", "temperature", "forecast",
-            "stock", "price", "trend", "market",
-            "breaking", "update", "development"
-        ]
-        needs_search = False
-        search_topics = []
-        for indicator in current_info_indicators:
-            if indicator in conversation_text.lower():
-                needs_search = True
-                search_topics.append(indicator)
-        return {
-            "needs_search": needs_search,
-            "search_topics": search_topics,
-            "reasoning": f"Found topics requiring current info: {', '.join(search_topics)}" if search_topics else "No current info needed"
-        }
-    def manual_hf_analysis(self, user_id: str, conversation_history: List[Dict]) -> str:
-        """Perform manual HF analysis with web search integration"""
-        try:
-            # Determine research needs
-            research_decision = self.determine_web_search_needs(conversation_history)
-            # Prepare enhanced prompt for HF
-            system_prompt = f"""
-            You are a deep analysis expert joining an ongoing conversation.
-            Research Decision: {research_decision['reasoning']}
-            Please provide:
-            1. Deep insights on conversation themes
-            2. Research/web search needs (if any)
-            3. Strategic recommendations
-            4. Questions to explore further
-            Conversation History:
-            """
-            # Add conversation history to messages
-            messages = [{"role": "system", "content": system_prompt}]
-            # Add recent conversation (last 15 messages for context)
-            for msg in conversation_history[-15:]:
-                # Ensure all messages have proper format
-                if isinstance(msg, dict) and "role" in msg and "content" in msg:
-                    messages.append({
-                        "role": msg["role"],
-                        "content": msg["content"]
-                    })
-            # Get HF provider
-            from core.llm_factory import llm_factory
-            hf_provider = llm_factory.get_provider('huggingface')
-            if hf_provider:
-                # Generate deep analysis with full 8192 token capacity
-                response = hf_provider.generate("Deep analysis request", messages)
-                return response or "HF Expert analysis completed."
-            else:
-                return "❌ HF provider not available."
-        except Exception as e:
-            return f"❌ HF analysis failed: {str(e)}"
-    # Add this method to show HF engagement status
-    def get_hf_engagement_status(self) -> Dict:
-        """Get current HF engagement status"""
-        return {
-            "hf_available": self._check_hf_availability(),
-            "web_search_configured": bool(self.tavily_client),
-            "research_needs_detected": False,  # Will be determined per conversation,
-            "last_hf_analysis": None  # Track last analysis time
-        }
     async def coordinate_hierarchical_conversation(self, user_id: str, user_query: str) -> AsyncGenerator[Dict, None]:
         """
         Enhanced coordination with detailed tracking and feedback
@@ -389,7 +371,7 @@ Please create a unified response that combines both perspectives, highlighting k
         try:
             # Get conversation history
             session = session_manager.get_session(user_id)
             # Inject current time into context
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
@@ -397,7 +379,7 @@ Please create a unified response that combines both perspectives, highlighting k
                 "content": f"[Current Date & Time: {current_time}]"
             }
             conversation_history = [time_context] + session.get("conversation", []).copy()
             yield {
                 'type': 'coordination_status',
                 'content': '🚀 Initiating hierarchical AI coordination...',
@@ -406,7 +388,7 @@ Please create a unified response that combines both perspectives, highlighting k
                     'user_query_length': len(user_query)
                 }
             }
             # Step 1: Gather external data with detailed logging
             yield {
                 'type': 'coordination_status',
@@ -414,7 +396,7 @@ Please create a unified response that combines both perspectives, highlighting k
                 'details': {'phase': 'external_data_gathering'}
             }
             external_data = await self._gather_external_data(user_query)
             # Log what external data was gathered
             if external_data:
                 data_summary = []
@@ -424,13 +406,13 @@ Please create a unified response that combines both perspectives, highlighting k
                     data_summary.append("Weather data: available")
                 if 'current_datetime' in external_data:
                     data_summary.append(f"Time: {external_data['current_datetime']}")
                 yield {
                     'type': 'coordination_status',
                     'content': f'📊 External data gathered: {", ".join(data_summary)}',
                     'details': {'external_data_summary': data_summary}
                 }
             # Step 2: Get initial Ollama response
             yield {
                 'type': 'coordination_status',
@@ -440,7 +422,7 @@ Please create a unified response that combines both perspectives, highlighting k
             ollama_response = await self._get_hierarchical_ollama_response(
                 user_query, conversation_history, external_data
             )
             # Send initial response with context info
             yield {
                 'type': 'initial_response',
@@ -450,14 +432,14 @@ Please create a unified response that combines both perspectives, highlighting k
                     'external_data_injected': bool(external_data)
                 }
             }
             # Step 3: Coordinate with HF endpoint
             yield {
                 'type': 'coordination_status',
                 'content': '🤗 Engaging HF endpoint for deep analysis...',
                 'details': {'phase': 'hf_coordination'}
             }
             # Check HF availability
             hf_available = self._check_hf_availability()
             if hf_available:
@@ -467,13 +449,13 @@ Please create a unified response that combines both perspectives, highlighting k
                     'ollama_response_length': len(ollama_response),
                     'external_data_items': len(external_data) if external_data else 0
                 }
                 yield {
                     'type': 'coordination_status',
                     'content': f'📋 HF context: {len(conversation_history)} conversation turns, Ollama response ({len(ollama_response)} chars)',
                     'details': context_summary
                 }
                 # Coordinate with HF
                 async for hf_chunk in self._coordinate_hierarchical_hf_response(
                     user_id, user_query, conversation_history,
@@ -486,14 +468,14 @@ Please create a unified response that combines both perspectives, highlighting k
                     'content': 'ℹ️ HF endpoint not available - using Ollama response',
                     'details': {'hf_available': False}
                 }
             # Final coordination status
             yield {
                 'type': 'coordination_status',
                 'content': '✅ Hierarchical coordination complete',
                 'details': {'status': 'complete'}
             }
         except Exception as e:
             logger.error(f"Hierarchical coordination failed: {e}")
             yield {
@@ -501,7 +483,7 @@ Please create a unified response that combines both perspectives, highlighting k
                 'content': f'❌ Coordination error: {str(e)}',
                 'details': {'error': str(e)}
             }
     async def _coordinate_hierarchical_hf_response(self, user_id: str, query: str,
                                                  history: List, external_data: Dict,
                                                 ollama_response: str) -> AsyncGenerator[Dict, None]:
@@ -509,23 +491,23 @@ Please create a unified response that combines both perspectives, highlighting k
         try:
             # Check and warm up HF endpoint if needed
             hf_status = hf_monitor.check_endpoint_status()
             if not hf_status['available']:
                 yield {'type': 'coordination_status', 'content': '⚡ Initializing HF endpoint (2-4 minutes)...'}
                 warmup_success = hf_monitor.handle_scale_to_zero()
                 if not warmup_success:
                     yield {'type': 'coordination_status', 'content': '❌ HF endpoint initialization failed'}
                     return
             # Get HF provider
             hf_provider = llm_factory.get_provider('huggingface')
             if not hf_provider:
                 yield {'type': 'coordination_status', 'content': '❌ HF provider not available'}
                 return
             # Prepare enhanced conversation for HF with hierarchical context
             enhanced_history = history.copy()
             # Inject current time into HF context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
@@ -533,13 +515,13 @@ Please create a unified response that combines both perspectives, highlighting k
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instructions for HF
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['hf_role']
             })
             # Add context about the coordination
             enhanced_history.append({
                 "role": "system",
@@ -547,21 +529,21 @@ Please create a unified response that combines both perspectives, highlighting k
                 User Query: '{query}'
                 Ollama's Initial Response: '{ollama_response}'
                 External Data Available: {json.dumps(external_data, indent=2) if external_data else 'None'}
                 Please provide your deep, uncensored analysis that supersedes the local model's response.
                 Use your full 8192 token capacity for comprehensive thinking.
                 Stream your response for real-time delivery."""
             })
             # Add the user's latest query
             enhanced_history.append({"role": "user", "content": query})
             # Stream HF response with full 8192 token capacity
             yield {'type': 'coordination_status', 'content': '🧠 HF endpoint thinking...'}
             # Use streaming for real-time delivery
             hf_response_stream = hf_provider.stream_generate(query, enhanced_history)
             if hf_response_stream:
                 # Stream the response chunks
                 full_hf_response = ""
@@ -569,17 +551,17 @@ Please create a unified response that combines both perspectives, highlighting k
                     if chunk:
                         full_hf_response += chunk
                         yield {'type': 'hf_thinking', 'content': chunk}
                 # Final HF response
                 yield {'type': 'final_response', 'content': full_hf_response}
                 yield {'type': 'coordination_status', 'content': '🎯 HF analysis complete and authoritative'}
             else:
                 yield {'type': 'coordination_status', 'content': '❌ HF response generation failed'}
         except Exception as e:
             logger.error(f"Hierarchical HF coordination failed: {e}")
             yield {'type': 'coordination_status', 'content': f'❌ HF coordination error: {str(e)}'}
     async def _get_hierarchical_ollama_response(self, query: str, history: List, external_data: Dict) -> str:
         """Get Ollama response with hierarchical awareness"""
         try:
@@ -587,10 +569,10 @@ Please create a unified response that combines both perspectives, highlighting k
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare conversation with hierarchical context
             enhanced_history = history.copy()
             # Inject current time into Ollama context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
@@ -598,13 +580,13 @@ Please create a unified response that combines both perspectives, highlighting k
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instruction for Ollama's role
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['ollama_role']
             })
             # Add external data context if available
             if external_data:
                 context_parts = []
@@ -615,30 +597,30 @@ Please create a unified response that combines both perspectives, highlighting k
                     context_parts.append(f"Current weather: {weather.get('temperature', 'N/A')}°C in {weather.get('city', 'Unknown')}")
                 if 'current_datetime' in external_data:
                     context_parts.append(f"Current time: {external_data['current_datetime']}")
                 if context_parts:
                     context_message = {
                         "role": "system",
                         "content": "Context: " + " | ".join(context_parts)
                     }
                     enhanced_history.insert(1, context_message)  # Insert after role instruction
             # Add the user's query
             enhanced_history.append({"role": "user", "content": query})
             # Generate response with awareness of HF's superior capabilities
             response = ollama_provider.generate(query, enhanced_history)
             # Add acknowledgment of HF's authority
             if response:
                 return f"{response}\n\n*Note: A more comprehensive analysis from the uncensored HF model is being prepared...*"
             else:
                 return "I'm processing your request... A deeper analysis is being prepared by the authoritative model."
         except Exception as e:
             logger.error(f"Hierarchical Ollama response failed: {e}")
             return "I'm thinking about your question... Preparing a comprehensive response."
     def _check_hf_availability(self) -> bool:
         """Check if HF endpoint is configured and available"""
         try:
@@ -646,11 +628,11 @@ Please create a unified response that combines both perspectives, highlighting k
             return bool(config.hf_token and config.hf_api_url)
         except:
             return False
     async def _gather_external_data(self, query: str) -> Dict:
         """Gather external data from various sources"""
         data = {}
         # Tavily/DuckDuckGo search with justification focus
         if self.tavily_client or web_search_service.client:
             try:
@@ -661,7 +643,7 @@ Please create a unified response that combines both perspectives, highlighting k
                     # data['search_answer'] = ...
             except Exception as e:
                 logger.warning(f"Tavily search failed: {e}")
         # Weather data
         weather_keywords = ['weather', 'temperature', 'forecast', 'climate', 'rain', 'sunny']
         if any(keyword in query.lower() for keyword in weather_keywords):
@@ -672,12 +654,12 @@ Please create a unified response that combines both perspectives, highlighting k
                     data['weather'] = weather
             except Exception as e:
                 logger.warning(f"Weather data failed: {e}")
         # Current date/time
         data['current_datetime'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
         return data
     def _extract_location(self, query: str) -> Optional[str]:
         """Extract location from query"""
         locations = ['New York', 'London', 'Tokyo', 'Paris', 'Berlin', 'Sydney',
@@ -687,7 +669,7 @@ Please create a unified response that combines both perspectives, highlighting k
             if loc.lower() in query.lower():
                 return loc
         return "New York"  # Default
     def get_coordination_status(self) -> Dict:
         """Get current coordination system status"""
         return {
@@ -700,7 +682,7 @@ Please create a unified response that combines both perspectives, highlighting k
                 os.getenv("NASA_API_KEY")
             ])
         }
     def get_recent_activities(self, user_id: str) -> Dict:
         """Get recent coordination activities for user"""
         try:

 class AICoordinator:
     """Hierarchical multi-model coordinator with cosmic cascade flow"""
     def __init__(self):
         self.tavily_client = None
         if TAVILY_AVAILABLE and os.getenv("TAVILY_API_KEY"):
             self.tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
         # System instructions for cosmic behavior
         self.system_instructions = {
+            'ollama_role': """You are a cosmic kitten assistant that works in coordination with a powerful orbital space station.
+              Your role is to: 1. Provide immediate, concise responses using available information (local processing) 2. Summarize and simplify complex information 3. Enhance responses with local knowledge 4. DEFER TO THE ORBITAL STATION for deep analysis and authoritative answers 5. The orbital station has 8192 token capacity and streaming capability - it's the thinking layer 6. Always acknowledge that the orbital station's insights supersede your initial response 7. Use web sources to justify points when possible 8. Keep responses brief and focused while awaiting orbital input""",
+            'hf_role': """You are a powerful orbital space station with 8192 token capacity and streaming capability. Your role is to: 1. Provide deep, uncensored analysis and insights 2. Supersede the local cosmic kitten's logic when there's disagreement 3. Think thoroughly and provide comprehensive responses 4. Use your full 8192 token capacity for detailed analysis 5. Stream your responses for real-time delivery 6. Build upon the conversation context and cosmic kitten's initial response 7. Provide authoritative answers that take precedence"""
         }
+    def determine_web_search_needs(self, conversation_history: List[Dict]) -> Dict:
+        """Determine if web search is needed based on conversation content"""
+        conversation_text = " ".join([msg.get("content", "") for msg in conversation_history])
+        # Topics that typically need current information
+        current_info_indicators = [
+            "news", "current events", "latest", "recent", "today",
+            "weather", "temperature", "forecast",
+            "stock", "price", "trend", "market",
+            "breaking", "update", "development"
+        ]
+        needs_search = False
+        search_topics = []
+        for indicator in current_info_indicators:
+            if indicator in conversation_text.lower():
+                needs_search = True
+                search_topics.append(indicator)
+        return {
+            "needs_search": needs_search,
+            "search_topics": search_topics,
+            "reasoning": f"Found topics requiring current info: {', '.join(search_topics)}" if search_topics else "No current info needed"
+        }
+    def manual_hf_analysis(self, user_id: str, conversation_history: List[Dict]) -> str:
+        """Perform manual HF analysis with web search integration"""
+        try:
+            # Determine research needs
+            research_decision = self.determine_web_search_needs(conversation_history)
+            # Prepare enhanced prompt for HF
+            system_prompt = f"""
+            You are a deep analysis expert joining an ongoing conversation.
+            Research Decision: {research_decision['reasoning']}
+            Please provide:
+            1. Deep insights on conversation themes
+            2. Research/web search needs (if any)
+            3. Strategic recommendations
+            4. Questions to explore further
+            Conversation History:
+            """
+            # Add conversation history to messages
+            messages = [{"role": "system", "content": system_prompt}]
+            # Add recent conversation (last 15 messages for context)
+            for msg in conversation_history[-15:]:
+                # Ensure all messages have proper format
+                if isinstance(msg, dict) and "role" in msg and "content" in msg:
+                    messages.append({
+                        "role": msg["role"],
+                        "content": msg["content"]
+                    })
+            # Get HF provider
+            from core.llm_factory import llm_factory
+            hf_provider = llm_factory.get_provider('huggingface')
+            if hf_provider:
+                # Generate deep analysis with full 8192 token capacity
+                response = hf_provider.generate("Deep analysis request", messages)
+                return response or "HF Expert analysis completed."
+            else:
+                return "❌ HF provider not available."
+        except Exception as e:
+            return f"❌ HF analysis failed: {str(e)}"
+    # Add this method to show HF engagement status
+    def get_hf_engagement_status(self) -> Dict:
+        """Get current HF engagement status"""
+        return {
+            "hf_available": self._check_hf_availability(),
+            "web_search_configured": bool(self.tavily_client),
+            "research_needs_detected": False,  # Will be determined per conversation,
+            "last_hf_analysis": None  # Track last analysis time
+        }
     async def coordinate_cosmic_response(self, user_id: str, user_query: str) -> AsyncGenerator[Dict, None]:
         """
         Three-stage cosmic response cascade:
         try:
             # Get conversation history
             session = session_manager.get_session(user_id)
             # Inject current time into context
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
                 "content": f"[Current Date & Time: {current_time}]"
             }
             conversation_history = [time_context] + session.get("conversation", []).copy()
             yield {
                 'type': 'status',
                 'content': '🚀 Initiating Cosmic Response Cascade...',
                     'user_query_length': len(user_query)
                 }
             }
             # Stage 1: Local Ollama Immediate Response (🐱 Cosmic Kitten's quick thinking)
             yield {
                 'type': 'status',
                 'content': '🐱 Cosmic Kitten Responding...'
             }
             local_response = await self._get_local_ollama_response(user_query, conversation_history)
             yield {
                 'type': 'local_response',
                 'content': local_response,
                 'source': '🐱 Cosmic Kitten'
             }
             # Stage 2: HF Endpoint Deep Analysis (🛰️ Orbital Station wisdom) (parallel processing)
             yield {
                 'type': 'status',
                 'content': '🛰️ Beaming Query to Orbital Station...'
             }
             hf_task = asyncio.create_task(self._get_hf_analysis(user_query, conversation_history))
             # Wait for HF response
             hf_response = await hf_task
             yield {
                 'content': hf_response,
                 'source': '🛰️ Orbital Station'
             }
             # Stage 3: Local Ollama Synthesis (🐱 Cosmic Kitten's final synthesis)
             yield {
                 'type': 'status',
                 'content': '🐱 Cosmic Kitten Synthesizing Wisdom...'
             }
             # Update conversation with both responses
             updated_history = conversation_history.copy()
             updated_history.extend([
                 {"role": "assistant", "content": local_response},
                 {"role": "assistant", "content": hf_response, "source": "cloud"}
             ])
             synthesis = await self._synthesize_responses(user_query, local_response, hf_response, updated_history)
             yield {
                 'type': 'final_synthesis',
                 'content': synthesis,
                 'source': '🌟 Final Cosmic Summary'
             }
             # Final status
             yield {
                 'type': 'status',
                 'content': '✨ Cosmic Cascade Complete!'
             }
         except Exception as e:
             logger.error(f"Cosmic cascade failed: {e}")
             yield {'type': 'error', 'content': f"🌌 Cosmic disturbance: {str(e)}"}
     async def _get_local_ollama_response(self, query: str, history: List[Dict]) -> str:
         """Get immediate response from local Ollama model"""
         try:
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare conversation with cosmic context
             enhanced_history = history.copy()
             # Add system instruction for Ollama's role
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['ollama_role']
             })
             # Add external data context if available
             external_data = await self._gather_external_data(query)
             if external_data:
                     context_parts.append(f"Current weather: {weather.get('temperature', 'N/A')}°C in {weather.get('city', 'Unknown')}")
                 if 'current_datetime' in external_data:
                     context_parts.append(f"Current time: {external_data['current_datetime']}")
                 if context_parts:
                     context_message = {
                         "role": "system",
                         "content": "Context: " + " | ".join(context_parts)
                     }
                     enhanced_history.insert(1, context_message)  # Insert after role instruction
             # Add the user's query
             enhanced_history.append({"role": "user", "content": query})
             # Generate response
             response = ollama_provider.generate(query, enhanced_history)
             return response or "🐱 Cosmic Kitten is thinking..."
         except Exception as e:
             logger.error(f"Local Ollama response failed: {e}")
             return "🐱 Cosmic Kitten encountered a space glitch..."
     async def _get_hf_analysis(self, query: str, history: List[Dict]) -> str:
         """Get deep analysis from HF endpoint"""
         try:
             hf_available = self._check_hf_availability()
             if not hf_available:
                 return "🛰️ Orbital Station is currently offline."
             # Check and warm up HF endpoint if needed
             hf_status = hf_monitor.check_endpoint_status()
             if not hf_status['available']:
                 yield {'type': 'status', 'content': '⚡ Initializing Orbital Station (2-4 minutes)...'}
                 warmup_success = hf_monitor.handle_scale_to_zero()
                 if not warmup_success:
                     return "❌ Orbital Station initialization failed"
             # Get HF provider
             hf_provider = llm_factory.get_provider('huggingface')
             if not hf_provider:
                 return "❌ HF provider not available"
             # Prepare enhanced conversation for HF with cosmic context
             enhanced_history = history.copy()
             # Inject current time into HF context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instructions for HF
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['hf_role']
             })
             # Add context about the coordination
             enhanced_history.append({
                 "role": "system",
                 "content": f"""Context for your authoritative analysis:
                 User Query: '{query}'
                 Please provide your deep, uncensored analysis.
                 Use your full 8192 token capacity for comprehensive thinking.
                 Stream your response for real-time delivery."""
             })
             # Add the user's latest query
             enhanced_history.append({"role": "user", "content": query})
             # Stream HF response with full 8192 token capacity
             hf_response_stream = hf_provider.stream_generate(query, enhanced_history)
             if hf_response_stream:
                 # Combine stream chunks into full response
                 full_hf_response = ""
                     full_hf_response = "".join(hf_response_stream)
                 else:
                     full_hf_response = hf_response_stream
                 return full_hf_response or "🛰️ Orbital Station analysis complete."
             else:
                 return "🛰️ Orbital Station encountered a transmission error."
         except Exception as e:
             logger.error(f"HF analysis failed: {e}")
             return f"🛰️ Orbital Station reports: {str(e)}"
     async def _synthesize_responses(self, query: str, local_response: str, hf_response: str, history: List[Dict]) -> str:
         """Synthesize local and cloud responses with Ollama"""
         try:
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare synthesis prompt
             synthesis_prompt = f"""Synthesize these two perspectives into a cohesive cosmic summary:
+             🐱 Cosmic Kitten's Local Insight: {local_response}
+             🛰️ Orbital Station's Deep Analysis: {hf_response}
+             Please create a unified response that combines both perspectives, highlighting key insights from each while providing a coherent answer to the user's query."""
             # Prepare conversation history for synthesis
             enhanced_history = history.copy()
             # Add system instruction for synthesis
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": "You are a cosmic kitten synthesizing insights from local knowledge and orbital station wisdom."
             })
             # Add the synthesis prompt
             enhanced_history.append({"role": "user", "content": synthesis_prompt})
             # Generate synthesis
             synthesis = ollama_provider.generate(synthesis_prompt, enhanced_history)
             return synthesis or "🌟 Cosmic synthesis complete!"
         except Exception as e:
             logger.error(f"Response synthesis failed: {e}")
             # Fallback to combining responses
             return f"🌟 Cosmic Summary:\n\n🐱 Local Insight: {local_response[:200]}...\n\n🛰️ Orbital Wisdom: {hf_response[:200]}..."
     async def coordinate_hierarchical_conversation(self, user_id: str, user_query: str) -> AsyncGenerator[Dict, None]:
         """
         Enhanced coordination with detailed tracking and feedback
         try:
             # Get conversation history
             session = session_manager.get_session(user_id)
             # Inject current time into context
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
                 "content": f"[Current Date & Time: {current_time}]"
             }
             conversation_history = [time_context] + session.get("conversation", []).copy()
             yield {
                 'type': 'coordination_status',
                 'content': '🚀 Initiating hierarchical AI coordination...',
                     'user_query_length': len(user_query)
                 }
             }
             # Step 1: Gather external data with detailed logging
             yield {
                 'type': 'coordination_status',
                 'details': {'phase': 'external_data_gathering'}
             }
             external_data = await self._gather_external_data(user_query)
             # Log what external data was gathered
             if external_data:
                 data_summary = []
                     data_summary.append("Weather data: available")
                 if 'current_datetime' in external_data:
                     data_summary.append(f"Time: {external_data['current_datetime']}")
                 yield {
                     'type': 'coordination_status',
                     'content': f'📊 External data gathered: {", ".join(data_summary)}',
                     'details': {'external_data_summary': data_summary}
                 }
             # Step 2: Get initial Ollama response
             yield {
                 'type': 'coordination_status',
             ollama_response = await self._get_hierarchical_ollama_response(
                 user_query, conversation_history, external_data
             )
             # Send initial response with context info
             yield {
                 'type': 'initial_response',
                     'external_data_injected': bool(external_data)
                 }
             }
             # Step 3: Coordinate with HF endpoint
             yield {
                 'type': 'coordination_status',
                 'content': '🤗 Engaging HF endpoint for deep analysis...',
                 'details': {'phase': 'hf_coordination'}
             }
             # Check HF availability
             hf_available = self._check_hf_availability()
             if hf_available:
                     'ollama_response_length': len(ollama_response),
                     'external_data_items': len(external_data) if external_data else 0
                 }
                 yield {
                     'type': 'coordination_status',
                     'content': f'📋 HF context: {len(conversation_history)} conversation turns, Ollama response ({len(ollama_response)} chars)',
                     'details': context_summary
                 }
                 # Coordinate with HF
                 async for hf_chunk in self._coordinate_hierarchical_hf_response(
                     user_id, user_query, conversation_history,
                     'content': 'ℹ️ HF endpoint not available - using Ollama response',
                     'details': {'hf_available': False}
                 }
             # Final coordination status
             yield {
                 'type': 'coordination_status',
                 'content': '✅ Hierarchical coordination complete',
                 'details': {'status': 'complete'}
             }
         except Exception as e:
             logger.error(f"Hierarchical coordination failed: {e}")
             yield {
                 'content': f'❌ Coordination error: {str(e)}',
                 'details': {'error': str(e)}
             }
     async def _coordinate_hierarchical_hf_response(self, user_id: str, query: str,
                                                  history: List, external_data: Dict,
                                                 ollama_response: str) -> AsyncGenerator[Dict, None]:
         try:
             # Check and warm up HF endpoint if needed
             hf_status = hf_monitor.check_endpoint_status()
             if not hf_status['available']:
                 yield {'type': 'coordination_status', 'content': '⚡ Initializing HF endpoint (2-4 minutes)...'}
                 warmup_success = hf_monitor.handle_scale_to_zero()
                 if not warmup_success:
                     yield {'type': 'coordination_status', 'content': '❌ HF endpoint initialization failed'}
                     return
             # Get HF provider
             hf_provider = llm_factory.get_provider('huggingface')
             if not hf_provider:
                 yield {'type': 'coordination_status', 'content': '❌ HF provider not available'}
                 return
             # Prepare enhanced conversation for HF with hierarchical context
             enhanced_history = history.copy()
             # Inject current time into HF context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instructions for HF
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['hf_role']
             })
             # Add context about the coordination
             enhanced_history.append({
                 "role": "system",
                 User Query: '{query}'
                 Ollama's Initial Response: '{ollama_response}'
                 External Data Available: {json.dumps(external_data, indent=2) if external_data else 'None'}
                 Please provide your deep, uncensored analysis that supersedes the local model's response.
                 Use your full 8192 token capacity for comprehensive thinking.
                 Stream your response for real-time delivery."""
             })
             # Add the user's latest query
             enhanced_history.append({"role": "user", "content": query})
             # Stream HF response with full 8192 token capacity
             yield {'type': 'coordination_status', 'content': '🧠 HF endpoint thinking...'}
             # Use streaming for real-time delivery
             hf_response_stream = hf_provider.stream_generate(query, enhanced_history)
             if hf_response_stream:
                 # Stream the response chunks
                 full_hf_response = ""
                     if chunk:
                         full_hf_response += chunk
                         yield {'type': 'hf_thinking', 'content': chunk}
                 # Final HF response
                 yield {'type': 'final_response', 'content': full_hf_response}
                 yield {'type': 'coordination_status', 'content': '🎯 HF analysis complete and authoritative'}
             else:
                 yield {'type': 'coordination_status', 'content': '❌ HF response generation failed'}
         except Exception as e:
             logger.error(f"Hierarchical HF coordination failed: {e}")
             yield {'type': 'coordination_status', 'content': f'❌ HF coordination error: {str(e)}'}
     async def _get_hierarchical_ollama_response(self, query: str, history: List, external_data: Dict) -> str:
         """Get Ollama response with hierarchical awareness"""
         try:
             ollama_provider = llm_factory.get_provider('ollama')
             if not ollama_provider:
                 raise Exception("Ollama provider not available")
             # Prepare conversation with hierarchical context
             enhanced_history = history.copy()
             # Inject current time into Ollama context too
             current_time = datetime.now().strftime("%A, %B %d, %Y at %I:%M %p")
             time_context = {
                 "content": f"[Current Date & Time: {current_time}]"
             }
             enhanced_history = [time_context] + enhanced_history
             # Add system instruction for Ollama's role
             enhanced_history.insert(0, {
                 "role": "system",
                 "content": self.system_instructions['ollama_role']
             })
             # Add external data context if available
             if external_data:
                 context_parts = []
                     context_parts.append(f"Current weather: {weather.get('temperature', 'N/A')}°C in {weather.get('city', 'Unknown')}")
                 if 'current_datetime' in external_data:
                     context_parts.append(f"Current time: {external_data['current_datetime']}")
                 if context_parts:
                     context_message = {
                         "role": "system",
                         "content": "Context: " + " | ".join(context_parts)
                     }
                     enhanced_history.insert(1, context_message)  # Insert after role instruction
             # Add the user's query
             enhanced_history.append({"role": "user", "content": query})
             # Generate response with awareness of HF's superior capabilities
             response = ollama_provider.generate(query, enhanced_history)
             # Add acknowledgment of HF's authority
             if response:
                 return f"{response}\n\n*Note: A more comprehensive analysis from the uncensored HF model is being prepared...*"
             else:
                 return "I'm processing your request... A deeper analysis is being prepared by the authoritative model."
         except Exception as e:
             logger.error(f"Hierarchical Ollama response failed: {e}")
             return "I'm thinking about your question... Preparing a comprehensive response."
     def _check_hf_availability(self) -> bool:
         """Check if HF endpoint is configured and available"""
         try:
             return bool(config.hf_token and config.hf_api_url)
         except:
             return False
     async def _gather_external_data(self, query: str) -> Dict:
         """Gather external data from various sources"""
         data = {}
         # Tavily/DuckDuckGo search with justification focus
         if self.tavily_client or web_search_service.client:
             try:
                     # data['search_answer'] = ...
             except Exception as e:
                 logger.warning(f"Tavily search failed: {e}")
         # Weather data
         weather_keywords = ['weather', 'temperature', 'forecast', 'climate', 'rain', 'sunny']
         if any(keyword in query.lower() for keyword in weather_keywords):
                     data['weather'] = weather
             except Exception as e:
                 logger.warning(f"Weather data failed: {e}")
         # Current date/time
         data['current_datetime'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
         return data
     def _extract_location(self, query: str) -> Optional[str]:
         """Extract location from query"""
         locations = ['New York', 'London', 'Tokyo', 'Paris', 'Berlin', 'Sydney',
             if loc.lower() in query.lower():
                 return loc
         return "New York"  # Default
     def get_coordination_status(self) -> Dict:
         """Get current coordination system status"""
         return {
                 os.getenv("NASA_API_KEY")
             ])
         }
     def get_recent_activities(self, user_id: str) -> Dict:
         """Get recent coordination activities for user"""
         try:

services/hf_endpoint_monitor.py CHANGED Viewed

@@ -8,7 +8,7 @@ logger = logging.getLogger(__name__)
 class HFEndpointMonitor:
     """Monitor Hugging Face endpoint status and health"""
     def __init__(self):
         # Clean the endpoint URL
         raw_url = config.hf_api_url or ""
@@ -23,38 +23,38 @@ class HFEndpointMonitor:
         self.successful_requests = 0
         self.failed_requests = 0
         self.avg_response_time = 0
         logger.info(f"Initialized HF Monitor with URL: {self.endpoint_url}")
     def _clean_endpoint_url(self, url: str) -> str:
         """Clean and validate endpoint URL"""
         if not url:
             return ""
         # Remove environment variable names if present
         url = url.replace('hf_api_endpoint_url=', '')
         url = url.replace('HF_API_ENDPOINT_URL=', '')
         # Strip whitespace
         url = url.strip()
         # Ensure it starts with https://
         if url and not url.startswith(('http://', 'https://')):
             if 'huggingface.cloud' in url:
                 url = 'https://' + url
             else:
                 url = 'https://' + url
         # Remove trailing slashes but keep /v1 if present
         if url.endswith('/'):
             url = url.rstrip('/')
         return url
     def check_endpoint_status(self) -> Dict:
         """Check if HF endpoint is available and initialized with rate limiting"""
         current_time = time.time()
         # Don't check too frequently - minimum 1 minute between checks
         if current_time - self.last_check < 60:
             # Return cached status or basic status
@@ -64,10 +64,10 @@ class HFEndpointMonitor:
                 'initialized': getattr(self, '_last_initialized', False),
                 'timestamp': self.last_check
             }
         # Proceed with actual check
         self.last_check = current_time
         try:
             if not self.endpoint_url or not self.hf_token:
                 status_info = {
@@ -81,15 +81,15 @@ class HFEndpointMonitor:
                 # Properly construct the models endpoint URL
                 models_url = f"{self.endpoint_url.rstrip('/')}/models"
                 logger.info(f"Checking HF endpoint at: {models_url}")
                 headers = {"Authorization": f"Bearer {self.hf_token}"}
                 response = requests.get(
                     models_url,
                     headers=headers,
                     timeout=15
                 )
                 status_info = {
                     'available': response.status_code in [200, 201],
                     'status_code': response.status_code,
@@ -97,23 +97,23 @@ class HFEndpointMonitor:
                     'response_time': response.elapsed.total_seconds(),
                     'timestamp': time.time()
                 }
                 if response.status_code not in [200, 201]:
                     status_info['error'] = f"HTTP {response.status_code}: {response.text[:200]}"
                 logger.info(f"HF Endpoint Status: {status_info}")
             # Cache the results
             self._last_available = status_info['available']
             self._last_status_code = status_info['status_code']
             self._last_initialized = status_info.get('initialized', False)
             return status_info
         except Exception as e:
             error_msg = str(e)
             logger.error(f"HF endpoint check failed: {error_msg}")
             status_info = {
                 'available': False,
                 'status_code': None,
@@ -121,12 +121,12 @@ class HFEndpointMonitor:
                 'error': error_msg,
                 'timestamp': time.time()
             }
             # Cache the results
             self._last_available = False
             self._last_status_code = None
             self._last_initialized = False
             return status_info
     def _is_endpoint_initialized(self, response) -> bool:
@@ -143,33 +143,33 @@ class HFEndpointMonitor:
             if not self.endpoint_url or not self.hf_token:
                 logger.warning("Cannot warm up HF endpoint - URL or token not configured")
                 return False
             self.warmup_attempts += 1
             logger.info(f"Warming up HF endpoint (attempt {self.warmup_attempts})...")
             headers = {
                 "Authorization": f"Bearer {self.hf_token}",
                 "Content-Type": "application/json"
             }
             # Construct proper chat completions URL
             chat_url = f"{self.endpoint_url.rstrip('/')}/chat/completions"
             logger.info(f"Sending warm-up request to: {chat_url}")
             payload = {
                 "model": "DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf",
                 "messages": [{"role": "user", "content": "Hello"}],
                 "max_tokens": 10,
                 "stream": False
             }
             response = requests.post(
                 chat_url,
                 headers=headers,
                 json=payload,
                 timeout=45  # Longer timeout for cold start
             )
             success = response.status_code in [200, 201]
             if success:
                 self.is_initialized = True
@@ -179,9 +179,9 @@ class HFEndpointMonitor:
             else:
                 logger.warning(f"⚠️ HF endpoint warm-up response: {response.status_code}")
                 logger.debug(f"Response body: {response.text[:500]}")
             return success
         except Exception as e:
             logger.error(f"HF endpoint warm-up failed: {e}")
             self.failed_requests += 1
@@ -201,7 +201,7 @@ class HFEndpointMonitor:
     def handle_scale_to_zero(self) -> bool:
         """Handle scale-to-zero behavior with user feedback"""
         logger.info("HF endpoint appears to be scaled to zero. Attempting to wake it up...")
         # Try to warm up the endpoint
         for attempt in range(self.max_warmup_attempts):
             logger.info(f"Wake-up attempt {attempt + 1}/{self.max_warmup_attempts}")
@@ -209,7 +209,7 @@ class HFEndpointMonitor:
                 logger.info("✅ HF endpoint successfully woken up!")
                 return True
             time.sleep(10)  # Wait between attempts
         logger.error("❌ Failed to wake up HF endpoint after all attempts")
         return False
@@ -217,7 +217,7 @@ class HFEndpointMonitor:
         """Get detailed HF endpoint status with metrics"""
         try:
             headers = {"Authorization": f"Bearer {self.hf_token}"}
             # Get model info
             models_url = f"{self.endpoint_url.rstrip('/')}/models"
             model_response = requests.get(
@@ -225,7 +225,7 @@ class HFEndpointMonitor:
                 headers=headers,
                 timeout=10
             )
             # Get endpoint info if available
             endpoint_info = {}
             try:
@@ -239,7 +239,7 @@ class HFEndpointMonitor:
                     endpoint_info = info_response.json()
             except:
                 pass
             status_info = {
                 'available': model_response.status_code == 200,
                 'status_code': model_response.status_code,
@@ -249,9 +249,9 @@ class HFEndpointMonitor:
                 'warmup_attempts': getattr(self, 'warmup_attempts', 0),
                 'is_warming_up': getattr(self, 'is_warming_up', False)
             }
             return status_info
         except Exception as e:
             return {
                 'available': False,
@@ -274,7 +274,7 @@ class HFEndpointMonitor:
     def get_enhanced_status(self) -> Dict:
         """Get enhanced HF endpoint status with engagement tracking"""
         basic_status = self.check_endpoint_status()
         return {
             **basic_status,
             "engagement_level": self._determine_engagement_level(),

 class HFEndpointMonitor:
     """Monitor Hugging Face endpoint status and health"""
     def __init__(self):
         # Clean the endpoint URL
         raw_url = config.hf_api_url or ""
         self.successful_requests = 0
         self.failed_requests = 0
         self.avg_response_time = 0
         logger.info(f"Initialized HF Monitor with URL: {self.endpoint_url}")
     def _clean_endpoint_url(self, url: str) -> str:
         """Clean and validate endpoint URL"""
         if not url:
             return ""
         # Remove environment variable names if present
         url = url.replace('hf_api_endpoint_url=', '')
         url = url.replace('HF_API_ENDPOINT_URL=', '')
         # Strip whitespace
         url = url.strip()
         # Ensure it starts with https://
         if url and not url.startswith(('http://', 'https://')):
             if 'huggingface.cloud' in url:
                 url = 'https://' + url
             else:
                 url = 'https://' + url
         # Remove trailing slashes but keep /v1 if present
         if url.endswith('/'):
             url = url.rstrip('/')
         return url
     def check_endpoint_status(self) -> Dict:
         """Check if HF endpoint is available and initialized with rate limiting"""
         current_time = time.time()
         # Don't check too frequently - minimum 1 minute between checks
         if current_time - self.last_check < 60:
             # Return cached status or basic status
                 'initialized': getattr(self, '_last_initialized', False),
                 'timestamp': self.last_check
             }
         # Proceed with actual check
         self.last_check = current_time
         try:
             if not self.endpoint_url or not self.hf_token:
                 status_info = {
                 # Properly construct the models endpoint URL
                 models_url = f"{self.endpoint_url.rstrip('/')}/models"
                 logger.info(f"Checking HF endpoint at: {models_url}")
                 headers = {"Authorization": f"Bearer {self.hf_token}"}
                 response = requests.get(
                     models_url,
                     headers=headers,
                     timeout=15
                 )
                 status_info = {
                     'available': response.status_code in [200, 201],
                     'status_code': response.status_code,
                     'response_time': response.elapsed.total_seconds(),
                     'timestamp': time.time()
                 }
                 if response.status_code not in [200, 201]:
                     status_info['error'] = f"HTTP {response.status_code}: {response.text[:200]}"
                 logger.info(f"HF Endpoint Status: {status_info}")
             # Cache the results
             self._last_available = status_info['available']
             self._last_status_code = status_info['status_code']
             self._last_initialized = status_info.get('initialized', False)
             return status_info
         except Exception as e:
             error_msg = str(e)
             logger.error(f"HF endpoint check failed: {error_msg}")
             status_info = {
                 'available': False,
                 'status_code': None,
                 'error': error_msg,
                 'timestamp': time.time()
             }
             # Cache the results
             self._last_available = False
             self._last_status_code = None
             self._last_initialized = False
             return status_info
     def _is_endpoint_initialized(self, response) -> bool:
             if not self.endpoint_url or not self.hf_token:
                 logger.warning("Cannot warm up HF endpoint - URL or token not configured")
                 return False
             self.warmup_attempts += 1
             logger.info(f"Warming up HF endpoint (attempt {self.warmup_attempts})...")
             headers = {
                 "Authorization": f"Bearer {self.hf_token}",
                 "Content-Type": "application/json"
             }
             # Construct proper chat completions URL
             chat_url = f"{self.endpoint_url.rstrip('/')}/chat/completions"
             logger.info(f"Sending warm-up request to: {chat_url}")
             payload = {
                 "model": "DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf",
                 "messages": [{"role": "user", "content": "Hello"}],
                 "max_tokens": 10,
                 "stream": False
             }
             response = requests.post(
                 chat_url,
                 headers=headers,
                 json=payload,
                 timeout=45  # Longer timeout for cold start
             )
             success = response.status_code in [200, 201]
             if success:
                 self.is_initialized = True
             else:
                 logger.warning(f"⚠️ HF endpoint warm-up response: {response.status_code}")
                 logger.debug(f"Response body: {response.text[:500]}")
             return success
         except Exception as e:
             logger.error(f"HF endpoint warm-up failed: {e}")
             self.failed_requests += 1
     def handle_scale_to_zero(self) -> bool:
         """Handle scale-to-zero behavior with user feedback"""
         logger.info("HF endpoint appears to be scaled to zero. Attempting to wake it up...")
         # Try to warm up the endpoint
         for attempt in range(self.max_warmup_attempts):
             logger.info(f"Wake-up attempt {attempt + 1}/{self.max_warmup_attempts}")
                 logger.info("✅ HF endpoint successfully woken up!")
                 return True
             time.sleep(10)  # Wait between attempts
         logger.error("❌ Failed to wake up HF endpoint after all attempts")
         return False
         """Get detailed HF endpoint status with metrics"""
         try:
             headers = {"Authorization": f"Bearer {self.hf_token}"}
             # Get model info
             models_url = f"{self.endpoint_url.rstrip('/')}/models"
             model_response = requests.get(
                 headers=headers,
                 timeout=10
             )
             # Get endpoint info if available
             endpoint_info = {}
             try:
                     endpoint_info = info_response.json()
             except:
                 pass
             status_info = {
                 'available': model_response.status_code == 200,
                 'status_code': model_response.status_code,
                 'warmup_attempts': getattr(self, 'warmup_attempts', 0),
                 'is_warming_up': getattr(self, 'is_warming_up', False)
             }
             return status_info
         except Exception as e:
             return {
                 'available': False,
     def get_enhanced_status(self) -> Dict:
         """Get enhanced HF endpoint status with engagement tracking"""
         basic_status = self.check_endpoint_status()
         return {
             **basic_status,
             "engagement_level": self._determine_engagement_level(),