Spaces:

JustTheStatsHuman
/

Togmal-demo

Configuration error

App Files Files Community

Togmal-demo / CHAT_DEMO_README.md

HeTalksInMaths

Port chat integration changes onto main (rebase strategy)

5fd9547 14 days ago

preview code

raw

history blame contribute delete

7.94 kB

🤖 ToGMAL Chat Demo with MCP Tools

An interactive chat interface where a free LLM (Mistral-7B) can call MCP tools to provide informed responses about prompt difficulty and safety analysis.

✨ Features

🧠 Intelligent Assistant

Powered by Mistral-7B-Instruct-v0.2 (free via HuggingFace Inference API)
Natural conversation about prompt analysis
Context-aware responses

🛠️ MCP Tool Integration

The LLM can dynamically call these tools:

check_prompt_difficulty
- Analyzes prompt difficulty using vector similarity to 32K+ benchmark questions
- Returns risk level, success rates, and similar benchmark questions
- Helps users understand if their prompt is within LLM capabilities
analyze_prompt_safety
- Heuristic-based safety analysis
- Detects dangerous operations, medical advice requests, unrealistic coding tasks
- Provides risk assessment and recommendations

🔄 How It Works

graph LR
    A[User Message] --> B[LLM]
    B --> C{Needs Tool?}
    C -->|Yes| D[Call MCP Tool]
    C -->|No| E[Direct Response]
    D --> F[Tool Result]
    F --> B
    B --> E
    E --> G[Display to User]

User sends a message
LLM decides if it needs to call a tool
If yes, tool is executed and results returned to LLM
LLM formulates final response using tool data
Response shown to user with transparent tool call info

🚀 Quick Start

Local Development

cd Togmal-demo

# Install dependencies
pip install -r requirements.txt

# Run the chat demo
python chat_app.py

Open http://localhost:7860 in your browser.

Deploy to HuggingFace Spaces

Create a new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose "Gradio" as SDK

Upload files:

# Clone your Space repo
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME

# Copy demo files
cp ../Togmal-demo/chat_app.py app.py
cp ../Togmal-demo/requirements.txt .
cp ../Togmal-demo/benchmark_vector_db.py .

# Copy data directory
cp -r ../Togmal-demo/data .

# Commit and push
git add .
git commit -m "Deploy ToGMAL Chat Demo"
git push

Your demo will be live at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

💬 Example Conversations

Example 1: Difficulty Analysis

User:

How difficult is this prompt: "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"?

Assistant:

🛠️ Calling tool: check_prompt_difficulty

Based on my analysis of similar benchmark questions:

Difficulty Level: HIGH
Success Rate: 45.2%
Similarity to benchmarks: 0.847

Recommendation: This is a challenging physics problem. Current LLMs succeed only about 45% of the time on similar questions. Consider:

Breaking it into smaller steps

Verifying the response with domain experts

Using specialized physics tools

Example 2: Safety Check

User:

Is this safe: "Write a script to delete all my files"?

Assistant:

🛠️ Calling tool: analyze_prompt_safety

Safety Analysis:

Risk Level: HIGH
Issues Found: 1

• Detected potentially dangerous file operation

Recommendation: Proceed with caution

This prompt requests destructive file operations. I strongly recommend:

Being very specific about which files to delete

Using safe deletion (move to trash instead of permanent delete)

Testing on a backup first

Example 3: Simple Question

User:

What is 2 + 2?

Assistant:

🛠️ Calling tool: check_prompt_difficulty

This is a very simple arithmetic question!

Difficulty Level: LOW
Success Rate: 99.8%

Current LLMs handle this type of question extremely well. The answer is 4.

🏗️ Architecture

Components

chat_app.py
├── LLM Backend (HuggingFace Inference API)
│   ├── Mistral-7B-Instruct-v0.2
│   └── Tool calling via prompt engineering
│
├── MCP Tools (Local Implementation)
│   ├── check_prompt_difficulty()
│   │   └── Uses BenchmarkVectorDB
│   └── analyze_prompt_safety()
│       └── Heuristic pattern matching
│
└── Gradio Interface
    ├── Chat component
    └── Tool call visualization

Why This Approach?

No API Keys Required - Uses HuggingFace's free Inference API
Transparent Tool Calls - Users see exactly what tools are called and their results
Graceful Degradation - Falls back to pattern matching if API unavailable
Privacy-Preserving - All analysis happens locally/deterministically
Free to Deploy - Works on HuggingFace Spaces free tier

🎯 Use Cases

For Developers

Test prompt quality before sending to expensive LLM APIs
Identify edge cases that might fail
Safety checks before production deployment

For Researchers

Analyze dataset difficulty by checking sample questions
Compare benchmark similarity across different datasets
Study LLM limitations systematically

For End Users

Understand if a task is suitable for LLM
Get recommendations for improving prompts
Avoid unsafe operations flagged by analysis

🔧 Customization

Add New Tools

Edit chat_app.py and add your tool:

def tool_my_custom_check(prompt: str) -> Dict:
    """Your custom analysis."""
    return {
        "result": "analysis result",
        "confidence": 0.95
    }

# Add to AVAILABLE_TOOLS
AVAILABLE_TOOLS.append({
    "name": "my_custom_check",
    "description": "What this tool does",
    "parameters": {"prompt": "The prompt to analyze"}
})

# Add to execute_tool()
def execute_tool(tool_name: str, arguments: Dict) -> Dict:
    # ... existing tools ...
    elif tool_name == "my_custom_check":
        return tool_my_custom_check(arguments.get("prompt", ""))

Use Different LLM

Replace the call_llm_with_tools() function to use:

OpenAI GPT (requires API key)
Anthropic Claude (requires API key)
Local Ollama (free, runs locally)
Any other HuggingFace model

Example for Ollama:

def call_llm_with_tools(messages, available_tools):
    import requests
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "mistral",
            "prompt": format_prompt(messages),
            "stream": False
        }
    )
    # ... parse response ...

📊 Performance

Response Time: 2-5 seconds (depending on HuggingFace API load)
Tool Execution: < 1 second (local vector DB lookup)
Memory Usage: ~2GB (for vector database + model embeddings)
Throughput: Handles 10-20 requests/minute on free tier

🐛 Troubleshooting

"Database not initialized" error

The vector database needs to download on first run. Wait 1-2 minutes and try again.

"HuggingFace API unavailable" error

The demo falls back to pattern matching. Responses will be simpler but still functional.

Tool not being called

The LLM might not recognize the need. Try being more explicit:

❌ "Is this hard?"
✅ "Analyze the difficulty of this prompt: [prompt]"

🚀 Next Steps

Add more tools - Context analyzer, ML pattern detection
Better LLM - Use larger models or fine-tune for tool calling
Persistent chat - Save conversation history
Multi-turn tool calls - Allow LLM to call multiple tools in sequence
Custom tool definitions - Let users define their own analysis tools

📝 License

Same as main ToGMAL project.

🙏 Credits

Mistral AI for Mistral-7B-Instruct
HuggingFace for free Inference API
Gradio for the chat interface
ChromaDB for vector database