talk2data

Sleeping

App Files Files Community

amirkiarafiei commited on May 2

Commit

938a3f9

1 Parent(s): 8ba5d9d

refactor: update README for clarity and remove deprecated chat history file

Browse files

Files changed (5) hide show

README.md +106 -152
base_chat_history.json +0 -1
gradio_app.py +6 -10
langchain_mcp_client.py +3 -2
memory_store.py +3 -1

README.md CHANGED Viewed

@@ -1,190 +1,144 @@
 # Natural Language SQL Query Agent with Visualization
-A PostgreSQL-based query system that converts natural language requests into SQL queries, executes them, and provides visualizations using PandasAI. Built with LangChain, FastMCP, and Gradio.
 ![Architecture](resources/visualization_demo.png)
-## Description
-This project combines several components to create a powerful natural language interface to PostgreSQL databases with visualization capabilities:
-### Core Components:
-1. **PostgreSQL MCP Server** (`postgre_mcp_server.py`):
-   - Handles database connections and query execution
-   - Provides tools for table listing, schema retrieval
-   - Implements visualization using PandasAI
-   - Manages database lifecycle and connection pooling
-2. **LangChain Client** (`langchain_mcp_client.py`):
-   - Converts natural language to SQL using LLM
-   - Manages conversation history
-   - Integrates with MCP tools
-   - Handles agent execution flow
-3. **Gradio Interface** (`gradio_app.py`):
-   - Provides web-based chat interface
-   - Handles user interactions
-   - Displays query results and visualizations
-4. **Memory Management** (`conversation_memory.py`):
-   - Implements conversation history persistence
-   - Tracks tool usage and queries
-   - Manages session state
-5. **Utilities** (`utils.py`):
-   - Provides helper functions for output parsing
-   - Handles MCP response formatting
-   - Manages logging
-6. **Visualization** (`pandasai_visualization.py` and MCP Tools):
-   - Implements PandasAI integration for intelligent chart generation
-   - Custom MCP tool `visualize_results` that:
-     * Takes query results as JSON and a visualization prompt
-     * Uses PandasAI to automatically generate appropriate visualizations
-     * Saves charts in the `exports/charts/` directory
-   - Supports various chart types (bar charts, line plots, pie charts, etc.)
-   - Intelligent prompt-based visualization selection
-   - Includes standalone testing script:
-     * Located at `pandasai_visualization.py`
-     * Can be run directly to test PandasAI functionality
-     * Creates sample data and generates test visualizations
-     * Usage: `python pandasai_visualization.py`
-     * Helps verify PandasAI setup and API key configuration
-## Installation
-1. **Clone the Repository:**
    ```bash
    git clone <repository-url>
    cd query_mcp_server
    ```
-2. **Create and Activate Virtual Environment:**
    ```bash
    python -m venv venv
-   source venv/bin/activate  # On Linux/Mac
    # or
-   .\venv\Scripts\activate  # On Windows
    ```
-3. **Install Dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
-4. **Configure Environment Variables:**
-   Create a `.env` file in the project root with the following variables:
-   ```
-   # Database Configuration
-   DB_URL=postgresql://username:password@localhost:5432/your_database
-   DB_SCHEMA=public
-   # Test the PandasAI Setup (Optional)
-   # Before running the main application, you can test the visualization component:
-   python pandasai_visualization.py
-   # This will create a sample visualization using PandasAI
-   # API Keys
-   PANDAS_KEY=your-pandasai-key          # Required for PandasAI visualization
-   GEMINI_API_KEY=your-gemini-api-key    # For LLM query understanding
-   GEMINI_MODEL=gemini-2.0-flash-lite    # LLM model selection
-   GEMINI_MODEL_PROVIDER=google_genai    # LLM provider
-   # Path Configuration
-   MCP_SERVER_PATH=/absolute/path/to/postgre_mcp_server.py
-   TABLE_SUMMARY_PATH=table_summary.txt
    ```
-## Running the Application
-1. **Ensure PostgreSQL Database is Running:**
-   - Make sure your PostgreSQL instance is up and accessible
-   - Verify the connection details in `.env` are correct
-2. **Start the Application:**
    ```bash
-   # Using the run script
    chmod +x run.sh
    ./run.sh
-   # Or directly with Python
-   python gradio_app.py
    ```
-3. **Access the Interface:**
-   - Open your browser to `http://localhost:7860`
-   - The chat interface will be ready for queries
-## Usage
-### Input Examples
-Example prompts:
 ```
-1. Simple queries:
-   "List all tables in the database"
-   "Show me the schema of table X"
-2. Analysis queries:
-   "Count the number of active customers by region"
-   "Show me total sales by product category for the last month"
-3. Visualization requests:
-   "Plot a bar chart showing sales distribution by region"
-   "Create a pie chart of customer segments"
-```
-### Output Structure
-1. **Query Results:**
-   - Text results are displayed directly in the chat interface
-   - Tabular data is formatted as markdown tables
-2. **Visualizations:**
-   - Generated charts are saved as PNG files in `./exports/charts/`
-   - Files are named with unique IDs: `temp_chart_{uuid}.png`
-   - Visualization is handled by the `visualize_results` MCP tool which:
-     * Automatically converts SQL results to pandas DataFrames
-     * Uses PandasAI to interpret visualization requests
-     * Generates appropriate chart types based on data and prompt
-   - Supports a wide range of visualization types:
-     * Bar charts for categorical comparisons
-     * Line plots for time series
-     * Pie charts for proportions
-     * Scatter plots for correlations
-     * And more based on data characteristics
-### Response Format
-The system provides responses in a structured format:
-```markdown
-# Result
-[Query results in table or list format]
-# Visualization (if requested)
-[Path to generated visualization file]
-# Explanation
-[Brief interpretation of results]
-# Query
-```sql
-[The executed SQL query]
-```
-## Project Structure
-```
-├── gradio_app.py              # Main application and UI
-├── postgre_mcp_server.py      # Database server and tools
-├── langchain_mcp_client.py    # LangChain integration
-├── conversation_memory.py     # Memory management
-├── utils.py                   # Helper utilities
-├── pandasai_visualization.py  # Visualization handling
-├── requirements.txt          # Project dependencies
-├── run.sh                    # Run script
-└── .env                      # Environment configuration
-```

 # Natural Language SQL Query Agent with Visualization
+A smart and interactive PostgreSQL query system that translates natural language requests into SQL queries, executes them, and generates visualizations using PandasAI. Built with modern technologies including LangChain, FastMCP, and Gradio.
 ![Architecture](resources/visualization_demo.png)
+## 🌟 Features
+- **Natural Language to SQL**: Convert plain English questions into SQL queries
+- **Interactive Chat Interface**: User-friendly Gradio web interface
+- **Smart Visualization**: Automated chart generation based on query results
+- **Conversation Memory**: Maintains context across multiple queries
+- **Database Schema Understanding**: Intelligent handling of database structure
+- **Multiple LLM Support**: Compatible with both OpenAI and Google's Gemini models
+## 🏗️ Architecture
+The project is structured into several key components:
+### 1. Query Processing Layer
+- **LangChain Client** (`langchain_mcp_client.py`):
+  - Manages LLM interactions for query understanding
+  - Handles conversation flow and context
+  - Integrates with MCP tools
+  - Supports multiple LLM providers (OpenAI/Gemini)
+### 2. Database Layer
+- **PostgreSQL MCP Server** (`postgre_mcp_server.py`):
+  - Manages PostgreSQL connections and query execution
+  - Implements connection pooling for efficiency
+  - Provides database schema information
+  - Handles query result processing
+### 3. Visualization Layer
+- **PandasAI Integration** (`pandasai_visualization.py`):
+  - Intelligent chart generation from query results
+  - Support for multiple chart types
+  - Automated visualization selection
+  - Exports charts to `exports/charts/` directory
+### 4. User Interface
+- **Gradio Web Interface** (`gradio_app.py`):
+  - Clean and intuitive chat interface
+  - Real-time query processing
+  - Visualization display
+  - Interactive session management
+### 5. Memory Management
+- **Conversation Store** (`memory_store.py`):
+  - Maintains conversation history
+  - Implements singleton pattern for global state
+  - Enables contextual query understanding
+## 🚀 Getting Started
+### Prerequisites
+- Python 3.11 or lower
+- PostgreSQL database
+- Access to either OpenAI API or Google Gemini API
+### Installation
+1. **Clone the Repository**
    ```bash
    git clone <repository-url>
    cd query_mcp_server
    ```
+2. **Set Up Virtual Environment**
    ```bash
    python -m venv venv
+   source venv/bin/activate  # Linux/Mac
    # or
+   .\venv\Scripts\activate   # Windows
    ```
+3. **Install Dependencies**
    ```bash
    pip install -r requirements.txt
    ```
+4. **Environment Configuration**
+   Create a `.env` file using the .env.example template:
+   ```bash
+   cp .env.example .env
    ```
+   Fill in the required environment variables.
+## 🏃‍♂️ Running the Application
+1. **Start the Application**
+   ```bash
+   python gradio_app.py
+   ```
+   or using run.sh
    ```bash
    chmod +x run.sh
    ./run.sh
    ```
+2. **Access the Interface**
+   - Open your browser and navigate to `http://localhost:7860`
+   - Start querying your database using natural language!
+## 🧪 Testing
+To test the visualization component independently:
+```bash
+python pandasai_visualization.py
+```
+This will generate sample visualizations to verify the PandasAI setup.
+## 📁 Project Structure
+```
+query_mcp_server/
+├── gradio_app.py           # Web interface
+├── langchain_mcp_client.py # LLM integration
+├── postgre_mcp_server.py   # Database handler
+├── pandasai_visualization.py # Visualization logic
+├── memory_store.py         # Conversation management
+├── exports/
+│   └── charts/            # Generated visualizations
+└── resources/             # Static resources
 ```
+## 🛠️ Contributing
+1. Fork the repository
+2. Create your feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+## 📝 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## ✨ Acknowledgments
+- LangChain for the powerful LLM framework
+- PandasAI for intelligent visualization capabilities
+- Gradio for the intuitive web interface
+- FastMCP for efficient database communication

base_chat_history.json DELETED Viewed

@@ -1 +0,0 @@

- [{"type": "human", "data": {"content": "list all tables", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null, "example": false}}, {"type": "ai", "data": {"content": "# Result\nThe tables in the database are:\n* dim\\_agreement\n* dim\\_customer\n* dim\\_product\n* dim\\_product\\_order\\_item\n\n# Explanation\nThe `list_tables` tool was called to retrieve a list of all available tables in the database schema. The result shows the names of these tables.\n\n# Query\n```sql\nN/A\n```", "additional_kwargs": {}, "response_metadata": {}, "type": "ai", "name": null, "id": null, "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null}}]

gradio_app.py CHANGED Viewed

@@ -34,10 +34,6 @@ def image_to_base64_markdown(image_path, alt_text="Customer Status"):
 # ====================================== Async-compatible wrapper
 async def run_agent(request, history=None):
     try:
-        logger.info(f"Current request: {request}")
-        memory = MemoryStore.get_memory()
-        logger.info(f"Current memory messages: {memory.messages}")
         # Process request using existing memory
         response, messages = await lc_mcp_exec(request)
@@ -100,9 +96,9 @@ custom_css = """
 with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
     with gr.Row(elem_classes="container"):
-        with gr.Column(scale=1):
-            gr.Image(value=LOGO_PATH, height=200, show_label=False)
-        with gr.Column(scale=3):
             gr.Markdown(
                 """
                 <h1 style='text-align: center; margin-bottom: 1rem'>Talk to Your Data</h1>
@@ -129,10 +125,10 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
                     "Describe the database",
                     "List all tables in the database",
                     "List all tables with columns and data types",
-                    "how many customers do you have?",
-                    "what are the statuses my of my customers",
                     "Visualize with different colors and show legend",
-                    "what are the statues of my customers and how many are in each status, show it by percentage",
                     "Total number of completed orders in six years by customer count show top most 10 customers",
                     "In january how many products has been sold ? group them by year",
                     "How many users and roles have been created in 2024"

 # ====================================== Async-compatible wrapper
 async def run_agent(request, history=None):
     try:
         # Process request using existing memory
         response, messages = await lc_mcp_exec(request)
 with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
     with gr.Row(elem_classes="container"):
+        # with gr.Column(scale=0.5):
+        #     gr.Image(value=LOGO_PATH, height=100, show_label=False, show_download_button=False, show_fullscreen_button=False)
+        with gr.Column(scale=5):
             gr.Markdown(
                 """
                 <h1 style='text-align: center; margin-bottom: 1rem'>Talk to Your Data</h1>
                     "Describe the database",
                     "List all tables in the database",
                     "List all tables with columns and data types",
+                    "How many customers do you have?",
+                    "What are the statuses my of my customers",
                     "Visualize with different colors and show legend",
+                    "What are the statues of my customers and how many are in each status, show it by percentage",
                     "Total number of completed orders in six years by customer count show top most 10 customers",
                     "In january how many products has been sold ? group them by year",
                     "How many users and roles have been created in 2024"

langchain_mcp_client.py CHANGED Viewed

@@ -11,7 +11,7 @@ from langchain.chat_models import init_chat_model
 import logging
 from dotenv import load_dotenv
 from langchain.globals import set_debug
-from langchain.memory import ChatMessageHistory
 from memory_store import MemoryStore
@@ -38,13 +38,14 @@ async def lc_mcp_exec(request: str, history=None) -> Tuple[str, list]:
         table_summary = load_table_summary(os.environ["TABLE_SUMMARY_PATH"])
         server_params = get_server_params()
-        # Initialize the LLM
         # llm = init_chat_model(
         #     model_provider=os.getenv("OPENAI_MODEL_PROVIDER"),
         #     model=os.getenv("OPENAI_MODEL"),
         #     api_key=os.getenv("OPENAI_API_KEY")
         # )
         llm = init_chat_model(
             model_provider=os.getenv("GEMINI_MODEL_PROVIDER"),
             model=os.getenv("GEMINI_MODEL"),

 import logging
 from dotenv import load_dotenv
 from langchain.globals import set_debug
+from langchain_community.chat_message_histories import ChatMessageHistory
 from memory_store import MemoryStore
         table_summary = load_table_summary(os.environ["TABLE_SUMMARY_PATH"])
         server_params = get_server_params()
+        # Initialize the LLM for OpenAI
         # llm = init_chat_model(
         #     model_provider=os.getenv("OPENAI_MODEL_PROVIDER"),
         #     model=os.getenv("OPENAI_MODEL"),
         #     api_key=os.getenv("OPENAI_API_KEY")
         # )
+        # Initialize the LLM for Gemini
         llm = init_chat_model(
             model_provider=os.getenv("GEMINI_MODEL_PROVIDER"),
             model=os.getenv("GEMINI_MODEL"),

memory_store.py CHANGED Viewed

@@ -1,9 +1,11 @@
-from langchain.memory import ChatMessageHistory
 from typing import Optional
 import logging
 logger = logging.getLogger(__name__)
 class MemoryStore:
     _instance: Optional['MemoryStore'] = None
     _memory: Optional[ChatMessageHistory] = None

+from langchain_community.chat_message_histories import ChatMessageHistory
 from typing import Optional
 import logging
 logger = logging.getLogger(__name__)
 class MemoryStore:
     _instance: Optional['MemoryStore'] = None
     _memory: Optional[ChatMessageHistory] = None