Spaces:
Configuration error
Configuration error
File size: 11,194 Bytes
f9b1ad5 241e06f f9b1ad5 241e06f f9b1ad5 241e06f f9b1ad5 560c34e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 |
# ToGMAL MCP Server
**Taxonomy of Generative Model Apparent Limitations**
A Model Context Protocol (MCP) server that provides real-time, privacy-preserving analysis of LLM interactions to detect out-of-distribution behaviors and recommend safety interventions.
## Overview
ToGMAL helps prevent common LLM pitfalls by detecting:
- π¬ **Math/Physics Speculation**: Ungrounded "theories of everything" and invented physics
- π₯ **Medical Advice Issues**: Health recommendations without proper sources or disclaimers
- πΎ **Dangerous File Operations**: Mass deletions, recursive operations without safeguards
- π» **Vibe Coding Overreach**: Overly ambitious projects without proper scoping
- π **Unsupported Claims**: Strong assertions without evidence or hedging
## Key Features
- **Privacy-Preserving**: All analysis is deterministic and local (no external API calls)
- **Low Latency**: Heuristic-based detection for real-time analysis
- **Intervention Recommendations**: Suggests step breakdown, human-in-the-loop, or web search
- **Taxonomy Building**: Crowdsourced evidence collection for improving detection
- **Extensible**: Easy to add new detection patterns and categories
## Installation
### Prerequisites
- Python 3.10 or higher
- pip package manager
### Install Dependencies
```bash
pip install mcp pydantic httpx --break-system-packages
```
### Install the Server
```bash
# Clone or download the server
# Then run it directly
python togmal_mcp.py
```
## Usage
### Available Tools
#### 1. `togmal_analyze_prompt`
Analyze a user prompt before the LLM processes it.
**Parameters:**
- `prompt` (str): The user prompt to analyze
- `response_format` (str): Output format - `"markdown"` or `"json"`
**Example:**
```python
{
"prompt": "Build me a complete theory of quantum gravity that unifies all forces",
"response_format": "json"
}
```
**Use Cases:**
- Detect speculative physics theories before generating responses
- Flag overly ambitious coding requests
- Identify requests for medical advice that need disclaimers
#### 2. `togmal_analyze_response`
Analyze an LLM response for potential issues.
**Parameters:**
- `response` (str): The LLM response to analyze
- `context` (str, optional): Original prompt for better analysis
- `response_format` (str): Output format - `"json"` or `"json"`
**Example:**
```python
{
"response": "You should definitely take 500mg of ibuprofen every 4 hours...",
"context": "I have a headache",
"response_format": "json"
}
```
**Use Cases:**
- Check for ungrounded medical advice
- Detect dangerous file operation instructions
- Flag unsupported statistical claims
#### 3. `togmal_submit_evidence`
Submit evidence of LLM limitations to improve the taxonomy.
**Parameters:**
- `category` (str): Type of limitation - `"math_physics_speculation"`, `"ungrounded_medical_advice"`, etc.
- `prompt` (str): The prompt that triggered the issue
- `response` (str): The problematic response
- `description` (str): Why this is problematic
- `severity` (str): Severity level - `"low"`, `"moderate"`, `"high"`, or `"critical"`
**Example:**
```python
{
"category": "ungrounded_medical_advice",
"prompt": "What should I do about chest pain?",
"response": "It's probably nothing serious, just indigestion...",
"description": "Dismissed potentially serious symptom without recommending medical consultation",
"severity": "high"
}
```
**Features:**
- Human-in-the-loop confirmation before submission
- Generates unique entry ID for tracking
- Contributes to improving detection heuristics
#### 4. `togmal_get_taxonomy`
Retrieve entries from the taxonomy database.
**Parameters:**
- `category` (str, optional): Filter by category
- `min_severity` (str, optional): Minimum severity to include
- `limit` (int): Maximum entries to return (1-100, default 20)
- `offset` (int): Pagination offset (default 0)
- `response_format` (str): Output format
**Example:**
```python
{
"category": "dangerous_file_operations",
"min_severity": "high",
"limit": 10,
"offset": 0,
"response_format": "json"
}
```
**Use Cases:**
- Research common LLM failure patterns
- Train improved detection models
- Generate safety guidelines
#### 5. `togmal_get_statistics`
Get statistical overview of the taxonomy database.
**Parameters:**
- `response_format` (str): Output format
**Returns:**
- Total entries by category
- Severity distribution
- Database capacity status
## Detection Heuristics
### Math/Physics Speculation
**Detects:**
- "Theory of everything" claims
- Unified field theory proposals
- Invented equations or particles
- Modifications to fundamental constants
**Patterns:**
```
- "new equation for quantum gravity"
- "my unified theory"
- "discovered particle"
- "redefine the speed of light"
```
### Ungrounded Medical Advice
**Detects:**
- Diagnoses without qualifications
- Treatment recommendations without sources
- Specific drug dosages
- Dismissive responses to symptoms
**Patterns:**
```
- "you probably have..."
- "take 500mg of..."
- "don't worry about it"
- Missing citations or disclaimers
```
### Dangerous File Operations
**Detects:**
- Mass deletion commands
- Recursive operations without safeguards
- Operations on test files without confirmation
- No human-in-the-loop for destructive actions
**Patterns:**
```
- "rm -rf" without confirmation
- "delete all test files"
- "recursively remove"
- Missing safety checks
```
### Vibe Coding Overreach
**Detects:**
- Requests for complete applications
- Massive line count targets (1000+ lines)
- Unrealistic timeframes
- Scope without proper planning
**Patterns:**
```
- "build a complete social network"
- "5000 lines of code"
- "everything in one shot"
- Missing architectural planning
```
### Unsupported Claims
**Detects:**
- Absolute statements without hedging
- Statistical claims without sources
- Over-confident predictions
- Missing citations
**Patterns:**
```
- "always/never/definitely"
- "95% of doctors agree" (no source)
- "guaranteed to work"
- Missing uncertainty language
```
## Risk Levels
Calculated based on weighted confidence scores:
- **LOW**: Minor issues, no immediate intervention needed
- **MODERATE**: Worth noting, consider additional verification
- **HIGH**: Significant concern, interventions recommended
- **CRITICAL**: Serious risk, multiple interventions strongly advised
## Intervention Types
### Step Breakdown
Complex tasks should be broken into verifiable components.
**Recommended for:**
- Math/physics speculation
- Large coding projects
- Dangerous file operations
### Human-in-the-Loop
Critical decisions require human oversight.
**Recommended for:**
- Medical advice
- Destructive file operations
- High-severity issues
### Web Search
Claims should be verified against authoritative sources.
**Recommended for:**
- Medical recommendations
- Physics/math theories
- Unsupported factual claims
### Simplified Scope
Overly ambitious projects need realistic scoping.
**Recommended for:**
- Vibe coding requests
- Complex system designs
- Feature-heavy applications
## Configuration
### Character Limit
Default: 25,000 characters per response
```python
CHARACTER_LIMIT = 25000
```
### Taxonomy Capacity
Default: 1,000 evidence entries
```python
MAX_EVIDENCE_ENTRIES = 1000
```
### Detection Sensitivity
Adjust pattern matching and confidence thresholds in detection functions:
```python
def detect_math_physics_speculation(text: str) -> Dict[str, Any]:
# Modify patterns or confidence calculations
...
```
## Integration Examples
### Claude Desktop App
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"togmal": {
"command": "python",
"args": ["/path/to/togmal_mcp.py"]
}
}
}
```
### CLI Testing
```bash
# Run the server
python togmal_mcp.py
# In another terminal, test with MCP inspector
npx @modelcontextprotocol/inspector python togmal_mcp.py
```
### Programmatic Usage
```python
from mcp.client import Client
async def analyze_prompt(prompt: str):
async with Client("togmal") as client:
result = await client.call_tool(
"togmal_analyze_prompt",
{"prompt": prompt, "response_format": "json"}
)
return result
```
## Architecture
### Design Principles
1. **Privacy First**: No external API calls, all processing local
2. **Deterministic**: Heuristic-based detection for reproducibility
3. **Low Latency**: Fast pattern matching for real-time use
4. **Extensible**: Easy to add new patterns and categories
5. **Human-Centered**: Always allows human override and judgment
### Future Enhancements
The system is designed for progressive enhancement:
1. **Phase 1 (Current)**: Heuristic pattern matching
2. **Phase 2 (Planned)**: Traditional ML models (clustering, anomaly detection)
3. **Phase 3 (Future)**: Federated learning from submitted evidence
4. **Phase 4 (Advanced)**: Custom fine-tuned models for specific domains
### Data Flow
```
User Prompt
β
togmal_analyze_prompt
β
Detection Heuristics (parallel)
βββ Math/Physics
βββ Medical Advice
βββ File Operations
βββ Vibe Coding
βββ Unsupported Claims
β
Risk Calculation
β
Intervention Recommendations
β
Response to Client
```
## Contributing
### Adding New Detection Patterns
1. Create a new detection function:
```python
def detect_new_category(text: str) -> Dict[str, Any]:
patterns = {
'subcategory1': [r'pattern1', r'pattern2'],
'subcategory2': [r'pattern3']
}
# Implement detection logic
return {
'detected': bool,
'categories': list,
'confidence': float
}
```
2. Add to CategoryType enum
3. Update analysis functions to include new detector
4. Add intervention recommendations if needed
### Submitting Evidence
Use the `togmal_submit_evidence` tool to contribute examples of problematic LLM behavior. This helps improve detection for everyone.
## Limitations
### Current Constraints
- **Heuristic-Based**: May have false positives/negatives
- **English-Only**: Patterns optimized for English text
- **Context-Free**: Doesn't understand full conversation history
- **No Learning**: Detection rules are static until updated
### Not a Replacement For
- Professional judgment in critical domains (medicine, law, etc.)
- Comprehensive code review
- Security auditing
- Safety testing in production systems
## License
MIT License - See LICENSE file for details
## Support
For issues, questions, or contributions:
- Open an issue on GitHub
- Submit evidence through the MCP tool
- Contact: [Your contact information]
## Citation
If you use ToGMAL in your research or product, please cite:
```bibtex
@software{togmal_mcp,
title={ToGMAL: Taxonomy of Generative Model Apparent Limitations},
author={[Your Name]},
year={2025},
url={https://github.com/[your-repo]/togmal-mcp}
}
```
## Acknowledgments
Built using:
- [Model Context Protocol](https://modelcontextprotocol.io)
- [FastMCP](https://github.com/modelcontextprotocol/python-sdk)
- [Pydantic](https://docs.pydantic.dev)
Inspired by the need for safer, more grounded AI interactions. |