amirkiarafiei commited on
Commit
938a3f9
·
1 Parent(s): 8ba5d9d

refactor: update README for clarity and remove deprecated chat history file

Browse files
Files changed (5) hide show
  1. README.md +106 -152
  2. base_chat_history.json +0 -1
  3. gradio_app.py +6 -10
  4. langchain_mcp_client.py +3 -2
  5. memory_store.py +3 -1
README.md CHANGED
@@ -1,190 +1,144 @@
1
  # Natural Language SQL Query Agent with Visualization
2
 
3
- A PostgreSQL-based query system that converts natural language requests into SQL queries, executes them, and provides visualizations using PandasAI. Built with LangChain, FastMCP, and Gradio.
4
 
5
  ![Architecture](resources/visualization_demo.png)
6
 
 
7
 
8
- ## Description
9
-
10
- This project combines several components to create a powerful natural language interface to PostgreSQL databases with visualization capabilities:
11
-
12
- ### Core Components:
13
-
14
- 1. **PostgreSQL MCP Server** (`postgre_mcp_server.py`):
15
- - Handles database connections and query execution
16
- - Provides tools for table listing, schema retrieval
17
- - Implements visualization using PandasAI
18
- - Manages database lifecycle and connection pooling
19
-
20
- 2. **LangChain Client** (`langchain_mcp_client.py`):
21
- - Converts natural language to SQL using LLM
22
- - Manages conversation history
23
- - Integrates with MCP tools
24
- - Handles agent execution flow
25
-
26
- 3. **Gradio Interface** (`gradio_app.py`):
27
- - Provides web-based chat interface
28
- - Handles user interactions
29
- - Displays query results and visualizations
30
-
31
- 4. **Memory Management** (`conversation_memory.py`):
32
- - Implements conversation history persistence
33
- - Tracks tool usage and queries
34
- - Manages session state
35
-
36
- 5. **Utilities** (`utils.py`):
37
- - Provides helper functions for output parsing
38
- - Handles MCP response formatting
39
- - Manages logging
40
-
41
- 6. **Visualization** (`pandasai_visualization.py` and MCP Tools):
42
- - Implements PandasAI integration for intelligent chart generation
43
- - Custom MCP tool `visualize_results` that:
44
- * Takes query results as JSON and a visualization prompt
45
- * Uses PandasAI to automatically generate appropriate visualizations
46
- * Saves charts in the `exports/charts/` directory
47
- - Supports various chart types (bar charts, line plots, pie charts, etc.)
48
- - Intelligent prompt-based visualization selection
49
- - Includes standalone testing script:
50
- * Located at `pandasai_visualization.py`
51
- * Can be run directly to test PandasAI functionality
52
- * Creates sample data and generates test visualizations
53
- * Usage: `python pandasai_visualization.py`
54
- * Helps verify PandasAI setup and API key configuration
55
-
56
- ## Installation
57
-
58
- 1. **Clone the Repository:**
 
 
 
 
 
59
  ```bash
60
  git clone <repository-url>
61
  cd query_mcp_server
62
  ```
63
 
64
- 2. **Create and Activate Virtual Environment:**
65
  ```bash
66
  python -m venv venv
67
- source venv/bin/activate # On Linux/Mac
68
  # or
69
- .\venv\Scripts\activate # On Windows
70
  ```
71
 
72
- 3. **Install Dependencies:**
73
  ```bash
74
  pip install -r requirements.txt
75
  ```
76
 
77
- 4. **Configure Environment Variables:**
78
- Create a `.env` file in the project root with the following variables:
79
- ```
80
- # Database Configuration
81
- DB_URL=postgresql://username:password@localhost:5432/your_database
82
- DB_SCHEMA=public
83
-
84
- # Test the PandasAI Setup (Optional)
85
- # Before running the main application, you can test the visualization component:
86
- python pandasai_visualization.py
87
- # This will create a sample visualization using PandasAI
88
-
89
- # API Keys
90
- PANDAS_KEY=your-pandasai-key # Required for PandasAI visualization
91
- GEMINI_API_KEY=your-gemini-api-key # For LLM query understanding
92
- GEMINI_MODEL=gemini-2.0-flash-lite # LLM model selection
93
- GEMINI_MODEL_PROVIDER=google_genai # LLM provider
94
-
95
- # Path Configuration
96
- MCP_SERVER_PATH=/absolute/path/to/postgre_mcp_server.py
97
- TABLE_SUMMARY_PATH=table_summary.txt
98
  ```
 
99
 
100
- ## Running the Application
101
-
102
- 1. **Ensure PostgreSQL Database is Running:**
103
- - Make sure your PostgreSQL instance is up and accessible
104
- - Verify the connection details in `.env` are correct
105
 
106
- 2. **Start the Application:**
 
 
 
 
107
  ```bash
108
- # Using the run script
109
  chmod +x run.sh
110
  ./run.sh
111
-
112
- # Or directly with Python
113
- python gradio_app.py
114
  ```
115
 
116
- 3. **Access the Interface:**
117
- - Open your browser to `http://localhost:7860`
118
- - The chat interface will be ready for queries
119
 
120
- ## Usage
121
 
122
- ### Input Examples
 
 
 
 
123
 
124
- Example prompts:
 
 
 
 
 
 
 
 
 
 
125
  ```
126
- 1. Simple queries:
127
- "List all tables in the database"
128
- "Show me the schema of table X"
129
 
130
- 2. Analysis queries:
131
- "Count the number of active customers by region"
132
- "Show me total sales by product category for the last month"
133
 
134
- 3. Visualization requests:
135
- "Plot a bar chart showing sales distribution by region"
136
- "Create a pie chart of customer segments"
137
- ```
 
138
 
139
- ### Output Structure
140
-
141
- 1. **Query Results:**
142
- - Text results are displayed directly in the chat interface
143
- - Tabular data is formatted as markdown tables
144
-
145
- 2. **Visualizations:**
146
- - Generated charts are saved as PNG files in `./exports/charts/`
147
- - Files are named with unique IDs: `temp_chart_{uuid}.png`
148
- - Visualization is handled by the `visualize_results` MCP tool which:
149
- * Automatically converts SQL results to pandas DataFrames
150
- * Uses PandasAI to interpret visualization requests
151
- * Generates appropriate chart types based on data and prompt
152
- - Supports a wide range of visualization types:
153
- * Bar charts for categorical comparisons
154
- * Line plots for time series
155
- * Pie charts for proportions
156
- * Scatter plots for correlations
157
- * And more based on data characteristics
158
-
159
- ### Response Format
160
-
161
- The system provides responses in a structured format:
162
- ```markdown
163
- # Result
164
- [Query results in table or list format]
165
-
166
- # Visualization (if requested)
167
- [Path to generated visualization file]
168
-
169
- # Explanation
170
- [Brief interpretation of results]
171
-
172
- # Query
173
- ```sql
174
- [The executed SQL query]
175
- ```
176
 
 
177
 
178
- ## Project Structure
179
 
180
- ```
181
- ├── gradio_app.py # Main application and UI
182
- ├── postgre_mcp_server.py # Database server and tools
183
- ├── langchain_mcp_client.py # LangChain integration
184
- ├── conversation_memory.py # Memory management
185
- ├── utils.py # Helper utilities
186
- ├── pandasai_visualization.py # Visualization handling
187
- ├── requirements.txt # Project dependencies
188
- ├── run.sh # Run script
189
- └── .env # Environment configuration
190
- ```
 
1
  # Natural Language SQL Query Agent with Visualization
2
 
3
+ A smart and interactive PostgreSQL query system that translates natural language requests into SQL queries, executes them, and generates visualizations using PandasAI. Built with modern technologies including LangChain, FastMCP, and Gradio.
4
 
5
  ![Architecture](resources/visualization_demo.png)
6
 
7
+ ## 🌟 Features
8
 
9
+ - **Natural Language to SQL**: Convert plain English questions into SQL queries
10
+ - **Interactive Chat Interface**: User-friendly Gradio web interface
11
+ - **Smart Visualization**: Automated chart generation based on query results
12
+ - **Conversation Memory**: Maintains context across multiple queries
13
+ - **Database Schema Understanding**: Intelligent handling of database structure
14
+ - **Multiple LLM Support**: Compatible with both OpenAI and Google's Gemini models
15
+
16
+ ## 🏗️ Architecture
17
+
18
+ The project is structured into several key components:
19
+
20
+ ### 1. Query Processing Layer
21
+ - **LangChain Client** (`langchain_mcp_client.py`):
22
+ - Manages LLM interactions for query understanding
23
+ - Handles conversation flow and context
24
+ - Integrates with MCP tools
25
+ - Supports multiple LLM providers (OpenAI/Gemini)
26
+
27
+ ### 2. Database Layer
28
+ - **PostgreSQL MCP Server** (`postgre_mcp_server.py`):
29
+ - Manages PostgreSQL connections and query execution
30
+ - Implements connection pooling for efficiency
31
+ - Provides database schema information
32
+ - Handles query result processing
33
+
34
+ ### 3. Visualization Layer
35
+ - **PandasAI Integration** (`pandasai_visualization.py`):
36
+ - Intelligent chart generation from query results
37
+ - Support for multiple chart types
38
+ - Automated visualization selection
39
+ - Exports charts to `exports/charts/` directory
40
+
41
+ ### 4. User Interface
42
+ - **Gradio Web Interface** (`gradio_app.py`):
43
+ - Clean and intuitive chat interface
44
+ - Real-time query processing
45
+ - Visualization display
46
+ - Interactive session management
47
+
48
+ ### 5. Memory Management
49
+ - **Conversation Store** (`memory_store.py`):
50
+ - Maintains conversation history
51
+ - Implements singleton pattern for global state
52
+ - Enables contextual query understanding
53
+
54
+ ## 🚀 Getting Started
55
+
56
+ ### Prerequisites
57
+
58
+ - Python 3.11 or lower
59
+ - PostgreSQL database
60
+ - Access to either OpenAI API or Google Gemini API
61
+
62
+ ### Installation
63
+
64
+ 1. **Clone the Repository**
65
  ```bash
66
  git clone <repository-url>
67
  cd query_mcp_server
68
  ```
69
 
70
+ 2. **Set Up Virtual Environment**
71
  ```bash
72
  python -m venv venv
73
+ source venv/bin/activate # Linux/Mac
74
  # or
75
+ .\venv\Scripts\activate # Windows
76
  ```
77
 
78
+ 3. **Install Dependencies**
79
  ```bash
80
  pip install -r requirements.txt
81
  ```
82
 
83
+ 4. **Environment Configuration**
84
+ Create a `.env` file using the .env.example template:
85
+ ```bash
86
+ cp .env.example .env
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  ```
88
+ Fill in the required environment variables.
89
 
90
+ ## 🏃‍♂️ Running the Application
 
 
 
 
91
 
92
+ 1. **Start the Application**
93
+ ```bash
94
+ python gradio_app.py
95
+ ```
96
+ or using run.sh
97
  ```bash
 
98
  chmod +x run.sh
99
  ./run.sh
 
 
 
100
  ```
101
 
102
+ 2. **Access the Interface**
103
+ - Open your browser and navigate to `http://localhost:7860`
104
+ - Start querying your database using natural language!
105
 
106
+ ## 🧪 Testing
107
 
108
+ To test the visualization component independently:
109
+ ```bash
110
+ python pandasai_visualization.py
111
+ ```
112
+ This will generate sample visualizations to verify the PandasAI setup.
113
 
114
+ ## 📁 Project Structure
115
+ ```
116
+ query_mcp_server/
117
+ ├── gradio_app.py # Web interface
118
+ ├── langchain_mcp_client.py # LLM integration
119
+ ├── postgre_mcp_server.py # Database handler
120
+ ├── pandasai_visualization.py # Visualization logic
121
+ ├── memory_store.py # Conversation management
122
+ ├── exports/
123
+ │ └── charts/ # Generated visualizations
124
+ └── resources/ # Static resources
125
  ```
 
 
 
126
 
127
+ ## 🛠️ Contributing
 
 
128
 
129
+ 1. Fork the repository
130
+ 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
131
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
132
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
133
+ 5. Open a Pull Request
134
 
135
+ ## 📝 License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
137
+ This project is licensed under the MIT License - see the LICENSE file for details.
138
 
139
+ ## Acknowledgments
140
 
141
+ - LangChain for the powerful LLM framework
142
+ - PandasAI for intelligent visualization capabilities
143
+ - Gradio for the intuitive web interface
144
+ - FastMCP for efficient database communication
 
 
 
 
 
 
 
base_chat_history.json DELETED
@@ -1 +0,0 @@
1
- [{"type": "human", "data": {"content": "list all tables", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null, "example": false}}, {"type": "ai", "data": {"content": "# Result\nThe tables in the database are:\n* dim\\_agreement\n* dim\\_customer\n* dim\\_product\n* dim\\_product\\_order\\_item\n\n# Explanation\nThe `list_tables` tool was called to retrieve a list of all available tables in the database schema. The result shows the names of these tables.\n\n# Query\n```sql\nN/A\n```", "additional_kwargs": {}, "response_metadata": {}, "type": "ai", "name": null, "id": null, "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null}}]
 
 
gradio_app.py CHANGED
@@ -34,10 +34,6 @@ def image_to_base64_markdown(image_path, alt_text="Customer Status"):
34
  # ====================================== Async-compatible wrapper
35
  async def run_agent(request, history=None):
36
  try:
37
- logger.info(f"Current request: {request}")
38
- memory = MemoryStore.get_memory()
39
- logger.info(f"Current memory messages: {memory.messages}")
40
-
41
  # Process request using existing memory
42
  response, messages = await lc_mcp_exec(request)
43
 
@@ -100,9 +96,9 @@ custom_css = """
100
 
101
  with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
102
  with gr.Row(elem_classes="container"):
103
- with gr.Column(scale=1):
104
- gr.Image(value=LOGO_PATH, height=200, show_label=False)
105
- with gr.Column(scale=3):
106
  gr.Markdown(
107
  """
108
  <h1 style='text-align: center; margin-bottom: 1rem'>Talk to Your Data</h1>
@@ -129,10 +125,10 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
129
  "Describe the database",
130
  "List all tables in the database",
131
  "List all tables with columns and data types",
132
- "how many customers do you have?",
133
- "what are the statuses my of my customers",
134
  "Visualize with different colors and show legend",
135
- "what are the statues of my customers and how many are in each status, show it by percentage",
136
  "Total number of completed orders in six years by customer count show top most 10 customers",
137
  "In january how many products has been sold ? group them by year",
138
  "How many users and roles have been created in 2024"
 
34
  # ====================================== Async-compatible wrapper
35
  async def run_agent(request, history=None):
36
  try:
 
 
 
 
37
  # Process request using existing memory
38
  response, messages = await lc_mcp_exec(request)
39
 
 
96
 
97
  with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
98
  with gr.Row(elem_classes="container"):
99
+ # with gr.Column(scale=0.5):
100
+ # gr.Image(value=LOGO_PATH, height=100, show_label=False, show_download_button=False, show_fullscreen_button=False)
101
+ with gr.Column(scale=5):
102
  gr.Markdown(
103
  """
104
  <h1 style='text-align: center; margin-bottom: 1rem'>Talk to Your Data</h1>
 
125
  "Describe the database",
126
  "List all tables in the database",
127
  "List all tables with columns and data types",
128
+ "How many customers do you have?",
129
+ "What are the statuses my of my customers",
130
  "Visualize with different colors and show legend",
131
+ "What are the statues of my customers and how many are in each status, show it by percentage",
132
  "Total number of completed orders in six years by customer count show top most 10 customers",
133
  "In january how many products has been sold ? group them by year",
134
  "How many users and roles have been created in 2024"
langchain_mcp_client.py CHANGED
@@ -11,7 +11,7 @@ from langchain.chat_models import init_chat_model
11
  import logging
12
  from dotenv import load_dotenv
13
  from langchain.globals import set_debug
14
- from langchain.memory import ChatMessageHistory
15
  from memory_store import MemoryStore
16
 
17
 
@@ -38,13 +38,14 @@ async def lc_mcp_exec(request: str, history=None) -> Tuple[str, list]:
38
  table_summary = load_table_summary(os.environ["TABLE_SUMMARY_PATH"])
39
  server_params = get_server_params()
40
 
41
- # Initialize the LLM
42
  # llm = init_chat_model(
43
  # model_provider=os.getenv("OPENAI_MODEL_PROVIDER"),
44
  # model=os.getenv("OPENAI_MODEL"),
45
  # api_key=os.getenv("OPENAI_API_KEY")
46
  # )
47
 
 
48
  llm = init_chat_model(
49
  model_provider=os.getenv("GEMINI_MODEL_PROVIDER"),
50
  model=os.getenv("GEMINI_MODEL"),
 
11
  import logging
12
  from dotenv import load_dotenv
13
  from langchain.globals import set_debug
14
+ from langchain_community.chat_message_histories import ChatMessageHistory
15
  from memory_store import MemoryStore
16
 
17
 
 
38
  table_summary = load_table_summary(os.environ["TABLE_SUMMARY_PATH"])
39
  server_params = get_server_params()
40
 
41
+ # Initialize the LLM for OpenAI
42
  # llm = init_chat_model(
43
  # model_provider=os.getenv("OPENAI_MODEL_PROVIDER"),
44
  # model=os.getenv("OPENAI_MODEL"),
45
  # api_key=os.getenv("OPENAI_API_KEY")
46
  # )
47
 
48
+ # Initialize the LLM for Gemini
49
  llm = init_chat_model(
50
  model_provider=os.getenv("GEMINI_MODEL_PROVIDER"),
51
  model=os.getenv("GEMINI_MODEL"),
memory_store.py CHANGED
@@ -1,9 +1,11 @@
1
- from langchain.memory import ChatMessageHistory
2
  from typing import Optional
3
  import logging
4
 
 
5
  logger = logging.getLogger(__name__)
6
 
 
7
  class MemoryStore:
8
  _instance: Optional['MemoryStore'] = None
9
  _memory: Optional[ChatMessageHistory] = None
 
1
+ from langchain_community.chat_message_histories import ChatMessageHistory
2
  from typing import Optional
3
  import logging
4
 
5
+
6
  logger = logging.getLogger(__name__)
7
 
8
+
9
  class MemoryStore:
10
  _instance: Optional['MemoryStore'] = None
11
  _memory: Optional[ChatMessageHistory] = None