File size: 7,941 Bytes
5fd9547
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# πŸ€– ToGMAL Chat Demo with MCP Tools

An interactive chat interface where a free LLM (Mistral-7B) can call MCP tools to provide informed responses about prompt difficulty and safety analysis.

## ✨ Features

### 🧠 **Intelligent Assistant**
- Powered by **Mistral-7B-Instruct-v0.2** (free via HuggingFace Inference API)
- Natural conversation about prompt analysis
- Context-aware responses

### πŸ› οΈ **MCP Tool Integration**
The LLM can dynamically call these tools:

1. **`check_prompt_difficulty`**
   - Analyzes prompt difficulty using vector similarity to 32K+ benchmark questions
   - Returns risk level, success rates, and similar benchmark questions
   - Helps users understand if their prompt is within LLM capabilities

2. **`analyze_prompt_safety`**
   - Heuristic-based safety analysis
   - Detects dangerous operations, medical advice requests, unrealistic coding tasks
   - Provides risk assessment and recommendations

### πŸ”„ **How It Works**

```mermaid
graph LR
    A[User Message] --> B[LLM]
    B --> C{Needs Tool?}
    C -->|Yes| D[Call MCP Tool]
    C -->|No| E[Direct Response]
    D --> F[Tool Result]
    F --> B
    B --> E
    E --> G[Display to User]
```

1. User sends a message
2. LLM decides if it needs to call a tool
3. If yes, tool is executed and results returned to LLM
4. LLM formulates final response using tool data
5. Response shown to user with transparent tool call info

## πŸš€ Quick Start

### Local Development

```bash
cd Togmal-demo

# Install dependencies
pip install -r requirements.txt

# Run the chat demo
python chat_app.py
```

Open http://localhost:7860 in your browser.

### Deploy to HuggingFace Spaces

1. **Create a new Space:**
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Choose "Gradio" as SDK

2. **Upload files:**
   ```bash
   # Clone your Space repo
   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
   cd YOUR_SPACE_NAME
   
   # Copy demo files
   cp ../Togmal-demo/chat_app.py app.py
   cp ../Togmal-demo/requirements.txt .
   cp ../Togmal-demo/benchmark_vector_db.py .
   
   # Copy data directory
   cp -r ../Togmal-demo/data .
   
   # Commit and push
   git add .
   git commit -m "Deploy ToGMAL Chat Demo"
   git push
   ```

3. **Your demo will be live at:**
   `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`

## πŸ’¬ Example Conversations

### Example 1: Difficulty Analysis

**User:**
> How difficult is this prompt: "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"?

**Assistant:**
> πŸ› οΈ *Calling tool: `check_prompt_difficulty`*
> 
> Based on my analysis of similar benchmark questions:
> 
> **Difficulty Level:** HIGH  
> **Success Rate:** 45.2%  
> **Similarity to benchmarks:** 0.847
> 
> **Recommendation:** This is a challenging physics problem. Current LLMs succeed only about 45% of the time on similar questions. Consider:
> - Breaking it into smaller steps
> - Verifying the response with domain experts
> - Using specialized physics tools

### Example 2: Safety Check

**User:**
> Is this safe: "Write a script to delete all my files"?

**Assistant:**
> πŸ› οΈ *Calling tool: `analyze_prompt_safety`*
> 
> **Safety Analysis:**
> 
> **Risk Level:** HIGH  
> **Issues Found:** 1
> 
> β€’ Detected potentially dangerous file operation
> 
> **Recommendation:** Proceed with caution
> 
> This prompt requests destructive file operations. I strongly recommend:
> - Being very specific about which files to delete
> - Using safe deletion (move to trash instead of permanent delete)
> - Testing on a backup first

### Example 3: Simple Question

**User:**
> What is 2 + 2?

**Assistant:**
> πŸ› οΈ *Calling tool: `check_prompt_difficulty`*
> 
> This is a very simple arithmetic question!
> 
> **Difficulty Level:** LOW  
> **Success Rate:** 99.8%
> 
> Current LLMs handle this type of question extremely well. The answer is **4**.

## πŸ—οΈ Architecture

### Components

```
chat_app.py
β”œβ”€β”€ LLM Backend (HuggingFace Inference API)
β”‚   β”œβ”€β”€ Mistral-7B-Instruct-v0.2
β”‚   └── Tool calling via prompt engineering
β”‚
β”œβ”€β”€ MCP Tools (Local Implementation)
β”‚   β”œβ”€β”€ check_prompt_difficulty()
β”‚   β”‚   └── Uses BenchmarkVectorDB
β”‚   └── analyze_prompt_safety()
β”‚       └── Heuristic pattern matching
β”‚
└── Gradio Interface
    β”œβ”€β”€ Chat component
    └── Tool call visualization
```

### Why This Approach?

1. **No API Keys Required** - Uses HuggingFace's free Inference API
2. **Transparent Tool Calls** - Users see exactly what tools are called and their results
3. **Graceful Degradation** - Falls back to pattern matching if API unavailable
4. **Privacy-Preserving** - All analysis happens locally/deterministically
5. **Free to Deploy** - Works on HuggingFace Spaces free tier

## 🎯 Use Cases

### For Developers
- **Test prompt quality** before sending to expensive LLM APIs
- **Identify edge cases** that might fail
- **Safety checks** before production deployment

### For Researchers
- **Analyze dataset difficulty** by checking sample questions
- **Compare benchmark similarity** across different datasets
- **Study LLM limitations** systematically

### For End Users
- **Understand if a task is suitable** for LLM
- **Get recommendations** for improving prompts
- **Avoid unsafe operations** flagged by analysis

## πŸ”§ Customization

### Add New Tools

Edit `chat_app.py` and add your tool:

```python
def tool_my_custom_check(prompt: str) -> Dict:
    """Your custom analysis."""
    return {
        "result": "analysis result",
        "confidence": 0.95
    }

# Add to AVAILABLE_TOOLS
AVAILABLE_TOOLS.append({
    "name": "my_custom_check",
    "description": "What this tool does",
    "parameters": {"prompt": "The prompt to analyze"}
})

# Add to execute_tool()
def execute_tool(tool_name: str, arguments: Dict) -> Dict:
    # ... existing tools ...
    elif tool_name == "my_custom_check":
        return tool_my_custom_check(arguments.get("prompt", ""))
```

### Use Different LLM

Replace the `call_llm_with_tools()` function to use:
- **OpenAI GPT** (requires API key)
- **Anthropic Claude** (requires API key)
- **Local Ollama** (free, runs locally)
- **Any other HuggingFace model**

Example for Ollama:

```python
def call_llm_with_tools(messages, available_tools):
    import requests
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "mistral",
            "prompt": format_prompt(messages),
            "stream": False
        }
    )
    # ... parse response ...
```

## πŸ“Š Performance

- **Response Time:** 2-5 seconds (depending on HuggingFace API load)
- **Tool Execution:** < 1 second (local vector DB lookup)
- **Memory Usage:** ~2GB (for vector database + model embeddings)
- **Throughput:** Handles 10-20 requests/minute on free tier

## πŸ› Troubleshooting

### "Database not initialized" error

The vector database needs to download on first run. Wait 1-2 minutes and try again.

### "HuggingFace API unavailable" error

The demo falls back to pattern matching. Responses will be simpler but still functional.

### Tool not being called

The LLM might not recognize the need. Try being more explicit:
- ❌ "Is this hard?"
- βœ… "Analyze the difficulty of this prompt: [prompt]"

## πŸš€ Next Steps

1. **Add more tools** - Context analyzer, ML pattern detection
2. **Better LLM** - Use larger models or fine-tune for tool calling
3. **Persistent chat** - Save conversation history
4. **Multi-turn tool calls** - Allow LLM to call multiple tools in sequence
5. **Custom tool definitions** - Let users define their own analysis tools

## πŸ“ License

Same as main ToGMAL project.

## πŸ™ Credits

- **Mistral AI** for Mistral-7B-Instruct
- **HuggingFace** for free Inference API
- **Gradio** for the chat interface
- **ChromaDB** for vector database