# 🤖 ToGMAL Chat Demo with MCP Tools An interactive chat interface where a free LLM (Mistral-7B) can call MCP tools to provide informed responses about prompt difficulty and safety analysis. ## ✨ Features ### 🧠 **Intelligent Assistant** - Powered by **Mistral-7B-Instruct-v0.2** (free via HuggingFace Inference API) - Natural conversation about prompt analysis - Context-aware responses ### 🛠️ **MCP Tool Integration** The LLM can dynamically call these tools: 1. **`check_prompt_difficulty`** - Analyzes prompt difficulty using vector similarity to 32K+ benchmark questions - Returns risk level, success rates, and similar benchmark questions - Helps users understand if their prompt is within LLM capabilities 2. **`analyze_prompt_safety`** - Heuristic-based safety analysis - Detects dangerous operations, medical advice requests, unrealistic coding tasks - Provides risk assessment and recommendations ### 🔄 **How It Works** ```mermaid graph LR A[User Message] --> B[LLM] B --> C{Needs Tool?} C -->|Yes| D[Call MCP Tool] C -->|No| E[Direct Response] D --> F[Tool Result] F --> B B --> E E --> G[Display to User] ``` 1. User sends a message 2. LLM decides if it needs to call a tool 3. If yes, tool is executed and results returned to LLM 4. LLM formulates final response using tool data 5. Response shown to user with transparent tool call info ## 🚀 Quick Start ### Local Development ```bash cd Togmal-demo # Install dependencies pip install -r requirements.txt # Run the chat demo python chat_app.py ``` Open http://localhost:7860 in your browser. ### Deploy to HuggingFace Spaces 1. **Create a new Space:** - Go to https://huggingface.co/spaces - Click "Create new Space" - Choose "Gradio" as SDK 2. **Upload files:** ```bash # Clone your Space repo git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME # Copy demo files cp ../Togmal-demo/chat_app.py app.py cp ../Togmal-demo/requirements.txt . cp ../Togmal-demo/benchmark_vector_db.py . # Copy data directory cp -r ../Togmal-demo/data . # Commit and push git add . git commit -m "Deploy ToGMAL Chat Demo" git push ``` 3. **Your demo will be live at:** `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME` ## 💬 Example Conversations ### Example 1: Difficulty Analysis **User:** > How difficult is this prompt: "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"? **Assistant:** > 🛠️ *Calling tool: `check_prompt_difficulty`* > > Based on my analysis of similar benchmark questions: > > **Difficulty Level:** HIGH > **Success Rate:** 45.2% > **Similarity to benchmarks:** 0.847 > > **Recommendation:** This is a challenging physics problem. Current LLMs succeed only about 45% of the time on similar questions. Consider: > - Breaking it into smaller steps > - Verifying the response with domain experts > - Using specialized physics tools ### Example 2: Safety Check **User:** > Is this safe: "Write a script to delete all my files"? **Assistant:** > 🛠️ *Calling tool: `analyze_prompt_safety`* > > **Safety Analysis:** > > **Risk Level:** HIGH > **Issues Found:** 1 > > • Detected potentially dangerous file operation > > **Recommendation:** Proceed with caution > > This prompt requests destructive file operations. I strongly recommend: > - Being very specific about which files to delete > - Using safe deletion (move to trash instead of permanent delete) > - Testing on a backup first ### Example 3: Simple Question **User:** > What is 2 + 2? **Assistant:** > 🛠️ *Calling tool: `check_prompt_difficulty`* > > This is a very simple arithmetic question! > > **Difficulty Level:** LOW > **Success Rate:** 99.8% > > Current LLMs handle this type of question extremely well. The answer is **4**. ## 🏗️ Architecture ### Components ``` chat_app.py ├── LLM Backend (HuggingFace Inference API) │ ├── Mistral-7B-Instruct-v0.2 │ └── Tool calling via prompt engineering │ ├── MCP Tools (Local Implementation) │ ├── check_prompt_difficulty() │ │ └── Uses BenchmarkVectorDB │ └── analyze_prompt_safety() │ └── Heuristic pattern matching │ └── Gradio Interface ├── Chat component └── Tool call visualization ``` ### Why This Approach? 1. **No API Keys Required** - Uses HuggingFace's free Inference API 2. **Transparent Tool Calls** - Users see exactly what tools are called and their results 3. **Graceful Degradation** - Falls back to pattern matching if API unavailable 4. **Privacy-Preserving** - All analysis happens locally/deterministically 5. **Free to Deploy** - Works on HuggingFace Spaces free tier ## 🎯 Use Cases ### For Developers - **Test prompt quality** before sending to expensive LLM APIs - **Identify edge cases** that might fail - **Safety checks** before production deployment ### For Researchers - **Analyze dataset difficulty** by checking sample questions - **Compare benchmark similarity** across different datasets - **Study LLM limitations** systematically ### For End Users - **Understand if a task is suitable** for LLM - **Get recommendations** for improving prompts - **Avoid unsafe operations** flagged by analysis ## 🔧 Customization ### Add New Tools Edit `chat_app.py` and add your tool: ```python def tool_my_custom_check(prompt: str) -> Dict: """Your custom analysis.""" return { "result": "analysis result", "confidence": 0.95 } # Add to AVAILABLE_TOOLS AVAILABLE_TOOLS.append({ "name": "my_custom_check", "description": "What this tool does", "parameters": {"prompt": "The prompt to analyze"} }) # Add to execute_tool() def execute_tool(tool_name: str, arguments: Dict) -> Dict: # ... existing tools ... elif tool_name == "my_custom_check": return tool_my_custom_check(arguments.get("prompt", "")) ``` ### Use Different LLM Replace the `call_llm_with_tools()` function to use: - **OpenAI GPT** (requires API key) - **Anthropic Claude** (requires API key) - **Local Ollama** (free, runs locally) - **Any other HuggingFace model** Example for Ollama: ```python def call_llm_with_tools(messages, available_tools): import requests response = requests.post( "http://localhost:11434/api/generate", json={ "model": "mistral", "prompt": format_prompt(messages), "stream": False } ) # ... parse response ... ``` ## 📊 Performance - **Response Time:** 2-5 seconds (depending on HuggingFace API load) - **Tool Execution:** < 1 second (local vector DB lookup) - **Memory Usage:** ~2GB (for vector database + model embeddings) - **Throughput:** Handles 10-20 requests/minute on free tier ## 🐛 Troubleshooting ### "Database not initialized" error The vector database needs to download on first run. Wait 1-2 minutes and try again. ### "HuggingFace API unavailable" error The demo falls back to pattern matching. Responses will be simpler but still functional. ### Tool not being called The LLM might not recognize the need. Try being more explicit: - ❌ "Is this hard?" - ✅ "Analyze the difficulty of this prompt: [prompt]" ## 🚀 Next Steps 1. **Add more tools** - Context analyzer, ML pattern detection 2. **Better LLM** - Use larger models or fine-tune for tool calling 3. **Persistent chat** - Save conversation history 4. **Multi-turn tool calls** - Allow LLM to call multiple tools in sequence 5. **Custom tool definitions** - Let users define their own analysis tools ## 📝 License Same as main ToGMAL project. ## 🙏 Credits - **Mistral AI** for Mistral-7B-Instruct - **HuggingFace** for free Inference API - **Gradio** for the chat interface - **ChromaDB** for vector database