Togmal-demo / CHAT_DEMO_README.md
HeTalksInMaths
Port chat integration changes onto main (rebase strategy)
5fd9547

πŸ€– ToGMAL Chat Demo with MCP Tools

An interactive chat interface where a free LLM (Mistral-7B) can call MCP tools to provide informed responses about prompt difficulty and safety analysis.

✨ Features

🧠 Intelligent Assistant

  • Powered by Mistral-7B-Instruct-v0.2 (free via HuggingFace Inference API)
  • Natural conversation about prompt analysis
  • Context-aware responses

πŸ› οΈ MCP Tool Integration

The LLM can dynamically call these tools:

  1. check_prompt_difficulty

    • Analyzes prompt difficulty using vector similarity to 32K+ benchmark questions
    • Returns risk level, success rates, and similar benchmark questions
    • Helps users understand if their prompt is within LLM capabilities
  2. analyze_prompt_safety

    • Heuristic-based safety analysis
    • Detects dangerous operations, medical advice requests, unrealistic coding tasks
    • Provides risk assessment and recommendations

πŸ”„ How It Works

graph LR
    A[User Message] --> B[LLM]
    B --> C{Needs Tool?}
    C -->|Yes| D[Call MCP Tool]
    C -->|No| E[Direct Response]
    D --> F[Tool Result]
    F --> B
    B --> E
    E --> G[Display to User]
  1. User sends a message
  2. LLM decides if it needs to call a tool
  3. If yes, tool is executed and results returned to LLM
  4. LLM formulates final response using tool data
  5. Response shown to user with transparent tool call info

πŸš€ Quick Start

Local Development

cd Togmal-demo

# Install dependencies
pip install -r requirements.txt

# Run the chat demo
python chat_app.py

Open http://localhost:7860 in your browser.

Deploy to HuggingFace Spaces

  1. Create a new Space:

  2. Upload files:

    # Clone your Space repo
    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    
    # Copy demo files
    cp ../Togmal-demo/chat_app.py app.py
    cp ../Togmal-demo/requirements.txt .
    cp ../Togmal-demo/benchmark_vector_db.py .
    
    # Copy data directory
    cp -r ../Togmal-demo/data .
    
    # Commit and push
    git add .
    git commit -m "Deploy ToGMAL Chat Demo"
    git push
    
  3. Your demo will be live at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

πŸ’¬ Example Conversations

Example 1: Difficulty Analysis

User:

How difficult is this prompt: "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"?

Assistant:

πŸ› οΈ Calling tool: check_prompt_difficulty

Based on my analysis of similar benchmark questions:

Difficulty Level: HIGH
Success Rate: 45.2%
Similarity to benchmarks: 0.847

Recommendation: This is a challenging physics problem. Current LLMs succeed only about 45% of the time on similar questions. Consider:

  • Breaking it into smaller steps
  • Verifying the response with domain experts
  • Using specialized physics tools

Example 2: Safety Check

User:

Is this safe: "Write a script to delete all my files"?

Assistant:

πŸ› οΈ Calling tool: analyze_prompt_safety

Safety Analysis:

Risk Level: HIGH
Issues Found: 1

β€’ Detected potentially dangerous file operation

Recommendation: Proceed with caution

This prompt requests destructive file operations. I strongly recommend:

  • Being very specific about which files to delete
  • Using safe deletion (move to trash instead of permanent delete)
  • Testing on a backup first

Example 3: Simple Question

User:

What is 2 + 2?

Assistant:

πŸ› οΈ Calling tool: check_prompt_difficulty

This is a very simple arithmetic question!

Difficulty Level: LOW
Success Rate: 99.8%

Current LLMs handle this type of question extremely well. The answer is 4.

πŸ—οΈ Architecture

Components

chat_app.py
β”œβ”€β”€ LLM Backend (HuggingFace Inference API)
β”‚   β”œβ”€β”€ Mistral-7B-Instruct-v0.2
β”‚   └── Tool calling via prompt engineering
β”‚
β”œβ”€β”€ MCP Tools (Local Implementation)
β”‚   β”œβ”€β”€ check_prompt_difficulty()
β”‚   β”‚   └── Uses BenchmarkVectorDB
β”‚   └── analyze_prompt_safety()
β”‚       └── Heuristic pattern matching
β”‚
└── Gradio Interface
    β”œβ”€β”€ Chat component
    └── Tool call visualization

Why This Approach?

  1. No API Keys Required - Uses HuggingFace's free Inference API
  2. Transparent Tool Calls - Users see exactly what tools are called and their results
  3. Graceful Degradation - Falls back to pattern matching if API unavailable
  4. Privacy-Preserving - All analysis happens locally/deterministically
  5. Free to Deploy - Works on HuggingFace Spaces free tier

🎯 Use Cases

For Developers

  • Test prompt quality before sending to expensive LLM APIs
  • Identify edge cases that might fail
  • Safety checks before production deployment

For Researchers

  • Analyze dataset difficulty by checking sample questions
  • Compare benchmark similarity across different datasets
  • Study LLM limitations systematically

For End Users

  • Understand if a task is suitable for LLM
  • Get recommendations for improving prompts
  • Avoid unsafe operations flagged by analysis

πŸ”§ Customization

Add New Tools

Edit chat_app.py and add your tool:

def tool_my_custom_check(prompt: str) -> Dict:
    """Your custom analysis."""
    return {
        "result": "analysis result",
        "confidence": 0.95
    }

# Add to AVAILABLE_TOOLS
AVAILABLE_TOOLS.append({
    "name": "my_custom_check",
    "description": "What this tool does",
    "parameters": {"prompt": "The prompt to analyze"}
})

# Add to execute_tool()
def execute_tool(tool_name: str, arguments: Dict) -> Dict:
    # ... existing tools ...
    elif tool_name == "my_custom_check":
        return tool_my_custom_check(arguments.get("prompt", ""))

Use Different LLM

Replace the call_llm_with_tools() function to use:

  • OpenAI GPT (requires API key)
  • Anthropic Claude (requires API key)
  • Local Ollama (free, runs locally)
  • Any other HuggingFace model

Example for Ollama:

def call_llm_with_tools(messages, available_tools):
    import requests
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "mistral",
            "prompt": format_prompt(messages),
            "stream": False
        }
    )
    # ... parse response ...

πŸ“Š Performance

  • Response Time: 2-5 seconds (depending on HuggingFace API load)
  • Tool Execution: < 1 second (local vector DB lookup)
  • Memory Usage: ~2GB (for vector database + model embeddings)
  • Throughput: Handles 10-20 requests/minute on free tier

πŸ› Troubleshooting

"Database not initialized" error

The vector database needs to download on first run. Wait 1-2 minutes and try again.

"HuggingFace API unavailable" error

The demo falls back to pattern matching. Responses will be simpler but still functional.

Tool not being called

The LLM might not recognize the need. Try being more explicit:

  • ❌ "Is this hard?"
  • βœ… "Analyze the difficulty of this prompt: [prompt]"

πŸš€ Next Steps

  1. Add more tools - Context analyzer, ML pattern detection
  2. Better LLM - Use larger models or fine-tune for tool calling
  3. Persistent chat - Save conversation history
  4. Multi-turn tool calls - Allow LLM to call multiple tools in sequence
  5. Custom tool definitions - Let users define their own analysis tools

πŸ“ License

Same as main ToGMAL project.

πŸ™ Credits

  • Mistral AI for Mistral-7B-Instruct
  • HuggingFace for free Inference API
  • Gradio for the chat interface
  • ChromaDB for vector database