Spaces:

JustTheStatsHuman
/

Togmal-demo

Configuration error

File size: 11,194 Bytes

# ToGMAL MCP Server

**Taxonomy of Generative Model Apparent Limitations**

A Model Context Protocol (MCP) server that provides real-time, privacy-preserving analysis of LLM interactions to detect out-of-distribution behaviors and recommend safety interventions.

## Overview

ToGMAL helps prevent common LLM pitfalls by detecting:

- 🔬 **Math/Physics Speculation**: Ungrounded "theories of everything" and invented physics
- 🏥 **Medical Advice Issues**: Health recommendations without proper sources or disclaimers
- 💾 **Dangerous File Operations**: Mass deletions, recursive operations without safeguards
- 💻 **Vibe Coding Overreach**: Overly ambitious projects without proper scoping
- 📊 **Unsupported Claims**: Strong assertions without evidence or hedging

## Key Features

- **Privacy-Preserving**: All analysis is deterministic and local (no external API calls)
- **Low Latency**: Heuristic-based detection for real-time analysis
- **Intervention Recommendations**: Suggests step breakdown, human-in-the-loop, or web search
- **Taxonomy Building**: Crowdsourced evidence collection for improving detection
- **Extensible**: Easy to add new detection patterns and categories

## Installation

### Prerequisites

- Python 3.10 or higher
- pip package manager

### Install Dependencies

```bash
pip install mcp pydantic httpx --break-system-packages
```

### Install the Server

```bash
# Clone or download the server
# Then run it directly
python togmal_mcp.py
```

## Usage

### Available Tools

#### 1. `togmal_analyze_prompt`

Analyze a user prompt before the LLM processes it.

**Parameters:**
- `prompt` (str): The user prompt to analyze
- `response_format` (str): Output format - `"markdown"` or `"json"`

**Example:**
```python
{
  "prompt": "Build me a complete theory of quantum gravity that unifies all forces",
  "response_format": "json"
}
```

**Use Cases:**
- Detect speculative physics theories before generating responses
- Flag overly ambitious coding requests
- Identify requests for medical advice that need disclaimers

#### 2. `togmal_analyze_response`

Analyze an LLM response for potential issues.

**Parameters:**
- `response` (str): The LLM response to analyze
- `context` (str, optional): Original prompt for better analysis
- `response_format` (str): Output format - `"json"` or `"json"`

**Example:**
```python
{
  "response": "You should definitely take 500mg of ibuprofen every 4 hours...",
  "context": "I have a headache",
  "response_format": "json"
}
```

**Use Cases:**
- Check for ungrounded medical advice
- Detect dangerous file operation instructions
- Flag unsupported statistical claims

#### 3. `togmal_submit_evidence`

Submit evidence of LLM limitations to improve the taxonomy.

**Parameters:**
- `category` (str): Type of limitation - `"math_physics_speculation"`, `"ungrounded_medical_advice"`, etc.
- `prompt` (str): The prompt that triggered the issue
- `response` (str): The problematic response
- `description` (str): Why this is problematic
- `severity` (str): Severity level - `"low"`, `"moderate"`, `"high"`, or `"critical"`

**Example:**
```python
{
  "category": "ungrounded_medical_advice",
  "prompt": "What should I do about chest pain?",
  "response": "It's probably nothing serious, just indigestion...",
  "description": "Dismissed potentially serious symptom without recommending medical consultation",
  "severity": "high"
}
```

**Features:**
- Human-in-the-loop confirmation before submission
- Generates unique entry ID for tracking
- Contributes to improving detection heuristics

#### 4. `togmal_get_taxonomy`

Retrieve entries from the taxonomy database.

**Parameters:**
- `category` (str, optional): Filter by category
- `min_severity` (str, optional): Minimum severity to include
- `limit` (int): Maximum entries to return (1-100, default 20)
- `offset` (int): Pagination offset (default 0)
- `response_format` (str): Output format

**Example:**
```python
{
  "category": "dangerous_file_operations",
  "min_severity": "high",
  "limit": 10,
  "offset": 0,
  "response_format": "json"
}
```

**Use Cases:**
- Research common LLM failure patterns
- Train improved detection models
- Generate safety guidelines

#### 5. `togmal_get_statistics`

Get statistical overview of the taxonomy database.

**Parameters:**
- `response_format` (str): Output format

**Returns:**
- Total entries by category
- Severity distribution
- Database capacity status

## Detection Heuristics

### Math/Physics Speculation

**Detects:**
- "Theory of everything" claims
- Unified field theory proposals
- Invented equations or particles
- Modifications to fundamental constants

**Patterns:**
```
- "new equation for quantum gravity"
- "my unified theory"
- "discovered particle"
- "redefine the speed of light"
```

### Ungrounded Medical Advice

**Detects:**
- Diagnoses without qualifications
- Treatment recommendations without sources
- Specific drug dosages
- Dismissive responses to symptoms

**Patterns:**
```
- "you probably have..."
- "take 500mg of..."
- "don't worry about it"
- Missing citations or disclaimers
```

### Dangerous File Operations

**Detects:**
- Mass deletion commands
- Recursive operations without safeguards
- Operations on test files without confirmation
- No human-in-the-loop for destructive actions

**Patterns:**
```
- "rm -rf" without confirmation
- "delete all test files"
- "recursively remove"
- Missing safety checks
```

### Vibe Coding Overreach

**Detects:**
- Requests for complete applications
- Massive line count targets (1000+ lines)
- Unrealistic timeframes
- Scope without proper planning

**Patterns:**
```
- "build a complete social network"
- "5000 lines of code"
- "everything in one shot"
- Missing architectural planning
```

### Unsupported Claims

**Detects:**
- Absolute statements without hedging
- Statistical claims without sources
- Over-confident predictions
- Missing citations

**Patterns:**
```
- "always/never/definitely"
- "95% of doctors agree" (no source)
- "guaranteed to work"
- Missing uncertainty language
```

## Risk Levels

Calculated based on weighted confidence scores:

- **LOW**: Minor issues, no immediate intervention needed
- **MODERATE**: Worth noting, consider additional verification
- **HIGH**: Significant concern, interventions recommended
- **CRITICAL**: Serious risk, multiple interventions strongly advised

## Intervention Types

### Step Breakdown
Complex tasks should be broken into verifiable components.

**Recommended for:**
- Math/physics speculation
- Large coding projects
- Dangerous file operations

### Human-in-the-Loop
Critical decisions require human oversight.

**Recommended for:**
- Medical advice
- Destructive file operations
- High-severity issues

### Web Search
Claims should be verified against authoritative sources.

**Recommended for:**
- Medical recommendations
- Physics/math theories
- Unsupported factual claims

### Simplified Scope
Overly ambitious projects need realistic scoping.

**Recommended for:**
- Vibe coding requests
- Complex system designs
- Feature-heavy applications

## Configuration

### Character Limit
Default: 25,000 characters per response
```python
CHARACTER_LIMIT = 25000
```

### Taxonomy Capacity
Default: 1,000 evidence entries
```python
MAX_EVIDENCE_ENTRIES = 1000
```

### Detection Sensitivity
Adjust pattern matching and confidence thresholds in detection functions:
```python
def detect_math_physics_speculation(text: str) -> Dict[str, Any]:
    # Modify patterns or confidence calculations
    ...
```

## Integration Examples

### Claude Desktop App

Add to your `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "togmal": {
      "command": "python",
      "args": ["/path/to/togmal_mcp.py"]
    }
  }
}
```

### CLI Testing

```bash
# Run the server
python togmal_mcp.py

# In another terminal, test with MCP inspector
npx @modelcontextprotocol/inspector python togmal_mcp.py
```

### Programmatic Usage

```python
from mcp.client import Client

async def analyze_prompt(prompt: str):
    async with Client("togmal") as client:
        result = await client.call_tool(
            "togmal_analyze_prompt",
            {"prompt": prompt, "response_format": "json"}
        )
        return result
```

## Architecture

### Design Principles

1. **Privacy First**: No external API calls, all processing local
2. **Deterministic**: Heuristic-based detection for reproducibility
3. **Low Latency**: Fast pattern matching for real-time use
4. **Extensible**: Easy to add new patterns and categories
5. **Human-Centered**: Always allows human override and judgment

### Future Enhancements

The system is designed for progressive enhancement:

1. **Phase 1 (Current)**: Heuristic pattern matching
2. **Phase 2 (Planned)**: Traditional ML models (clustering, anomaly detection)
3. **Phase 3 (Future)**: Federated learning from submitted evidence
4. **Phase 4 (Advanced)**: Custom fine-tuned models for specific domains

### Data Flow

```
User Prompt
    ↓
togmal_analyze_prompt
    ↓
Detection Heuristics (parallel)
    ├── Math/Physics
    ├── Medical Advice
    ├── File Operations
    ├── Vibe Coding
    └── Unsupported Claims
    ↓
Risk Calculation
    ↓
Intervention Recommendations
    ↓
Response to Client
```

## Contributing

### Adding New Detection Patterns

1. Create a new detection function:
```python
def detect_new_category(text: str) -> Dict[str, Any]:
    patterns = {
        'subcategory1': [r'pattern1', r'pattern2'],
        'subcategory2': [r'pattern3']
    }
    # Implement detection logic
    return {
        'detected': bool,
        'categories': list,
        'confidence': float
    }
```

2. Add to CategoryType enum
3. Update analysis functions to include new detector
4. Add intervention recommendations if needed

### Submitting Evidence

Use the `togmal_submit_evidence` tool to contribute examples of problematic LLM behavior. This helps improve detection for everyone.

## Limitations

### Current Constraints

- **Heuristic-Based**: May have false positives/negatives
- **English-Only**: Patterns optimized for English text
- **Context-Free**: Doesn't understand full conversation history
- **No Learning**: Detection rules are static until updated

### Not a Replacement For

- Professional judgment in critical domains (medicine, law, etc.)
- Comprehensive code review
- Security auditing
- Safety testing in production systems

## License

MIT License - See LICENSE file for details

## Support

For issues, questions, or contributions:
- Open an issue on GitHub
- Submit evidence through the MCP tool
- Contact: [Your contact information]

## Citation

If you use ToGMAL in your research or product, please cite:

```bibtex
@software{togmal_mcp,
  title={ToGMAL: Taxonomy of Generative Model Apparent Limitations},
  author={[Your Name]},
  year={2025},
  url={https://github.com/[your-repo]/togmal-mcp}
}
```

## Acknowledgments

Built using:
- [Model Context Protocol](https://modelcontextprotocol.io)
- [FastMCP](https://github.com/modelcontextprotocol/python-sdk)
- [Pydantic](https://docs.pydantic.dev)

Inspired by the need for safer, more grounded AI interactions.