websearch

Sleeping

websearch / README.md

Update README and app.py for Web Search MCP Server: enhance documentation, improve usage instructions, and implement main content extraction with error handling.

6ef48c6 4 months ago

preview code

raw

history blame

4.09 kB

metadata

title: Websearch
emoji: 🔎
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false

Web Search MCP Server

A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from recent news articles.

Features

Real-time web search: Search for recent news on any topic
Content extraction: Automatically extracts main article content, removing ads and boilerplate
Rate limiting: Built-in rate limiting (200 requests/hour) to prevent API abuse
Structured output: Returns formatted content with metadata (title, source, date, URL)
Flexible results: Control the number of results (1-20)

Prerequisites

Serper API Key: Sign up at serper.dev to get your API key
Python 3.8+: Ensure you have Python installed
MCP-compatible LLM client: Such as Claude Desktop, Cursor, or any MCP-enabled application

Installation

Clone or download this repository

Install dependencies:

pip install -r requirements.txt

Or install manually:

pip install "gradio[mcp]" httpx trafilatura python-dateutil limits

Set your Serper API key:

export SERPER_API_KEY="your-api-key-here"

Usage

Starting the MCP Server

python app_mcp.py

The server will start on http://localhost:7860 with the MCP endpoint at:

http://localhost:7860/gradio_api/mcp/sse

Connecting to LLM Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-search": {
      "command": "python",
      "args": ["/path/to/app_mcp.py"],
      "env": {
        "SERPER_API_KEY": "your-api-key-here"
      }
    }
  }
}

Direct URL Connection

For clients that support URL-based MCP servers:

Start the server: python app_mcp.py
Connect to: http://localhost:7860/gradio_api/mcp/sse

Tool Documentation

`search_web` Function

Purpose: Search the web for recent news and extract article content.

Parameters:

query (str, REQUIRED): The search query
- Examples: "OpenAI news", "climate change 2024", "python updates"
num_results (int, OPTIONAL): Number of results to fetch
- Default: 4
- Range: 1-20
- More results provide more context but take longer

Returns: Formatted text containing:

Summary of extraction results
For each article:
- Title
- Source and date
- URL
- Extracted main content

Example Usage in LLM:

"Search for recent developments in artificial intelligence"
"Find 10 articles about climate change in 2024"
"Get news about Python programming language updates"

Error Handling

The tool handles various error scenarios:

Missing API key: Clear error message with setup instructions
Rate limiting: Informs when limit is exceeded
Failed extractions: Reports which articles couldn't be extracted
Network errors: Graceful error messages

Testing

You can test the server manually:

Open http://localhost:7860 in your browser
Enter a search query
Adjust the number of results
Click "Search" to see the extracted content

Tips for LLM Usage

Be specific with queries: More specific queries yield better results
Adjust result count: Use fewer results for quick searches, more for comprehensive research
Check dates: The tool shows article dates for temporal context
Follow up: Use the extracted content to ask follow-up questions

Limitations

Rate limited to 200 requests per hour
Only searches news articles (not general web pages)
Extraction quality depends on website structure
Some websites may block automated access

Troubleshooting

"SERPER_API_KEY is not set": Ensure the environment variable is exported
Rate limit errors: Wait before making more requests
No content extracted: Some websites block scrapers; try different queries
Connection errors: Check your internet connection and firewall settings