UKURIKIYEYEZU commited on
Commit
a5ca9fc
·
verified ·
1 Parent(s): 6e0d924

Upload 2 files

Browse files
requirements.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ pandas
3
+ numpy
4
+ matplotlib
5
+ seaborn
6
+ plotly
7
+ wordcloud
8
+ textblob
9
+ scikit-learn
10
+ openpyxl
11
+ Pillow
12
+ transformers
13
+ keybert
understanding_public_sentiment_on_the_new_distance_based_fare_system.py ADDED
@@ -0,0 +1,833 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Importing the requirements libraries
3
+ import pandas as pd
4
+ import numpy as np
5
+ import matplotlib.pyplot as plt
6
+ import plotly.express as px
7
+ import plotly.graph_objects as go
8
+ from plotly.subplots import make_subplots
9
+ import io
10
+ import base64
11
+ import random
12
+ from datetime import datetime, timedelta
13
+ from collections import Counter
14
+ import gradio as gr
15
+ from wordcloud import WordCloud
16
+ import os
17
+
18
+ from transformers import pipeline
19
+ from keybert import KeyBERT
20
+
21
+ # Initialize selected models suitable for this task
22
+ classifier = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
23
+ kw_model = KeyBERT()
24
+
25
+ # Label mapping, naming them and initialize color according to category
26
+ sentiment_map = {
27
+ "LABEL_0": "Negative", "LABEL_1": "Neutral", "LABEL_2": "Positive",
28
+ "negative": "Negative", "neutral": "Neutral", "positive": "Positive",
29
+ "NEGATIVE": "Negative", "NEUTRAL": "Neutral", "POSITIVE": "Positive"
30
+ }
31
+ color_map = {"Positive": "#2E8B57", "Neutral": "#4682B4", "Negative": "#CD5C5C"}
32
+
33
+ # Default comments for when no file is uploaded
34
+ comments = [
35
+ "This new distance fare is really fair. I pay less for short trips!",
36
+ "It's confusing, I don't know how much I'll pay now.",
37
+ "RURA should have informed us better about this change.",
38
+ "Good step towards fairness and modernization.",
39
+ "Too expensive now! I hate this new system.",
40
+ "The distance-based system makes so much more sense than flat rates.",
41
+ "Why should I pay the same for 1km as I would for 10km? This is better.",
42
+ "Finally a fair system — short-distance commuters benefit the most!",
43
+ "I'm still unsure how the new rates are calculated. Needs clarity.",
44
+ "A detailed public awareness campaign would have helped a lot.",
45
+ "Smart move toward a fairer system, but more awareness is needed.",
46
+ "I'm paying more now and it feels unjust.",
47
+ "Flat rates were easier to understand, but this is more logical.",
48
+ "Paying based on distance is reasonable, but it needs fine-tuning.",
49
+ "App crashes when I try to calculate my fare. Fix it!",
50
+ "Drivers are confused about the new system too.",
51
+ "Great initiative but poor implementation.",
52
+ "Now I know exactly what I'm paying for. Transparent and fair.",
53
+ "The fare calculator is very helpful.",
54
+ "Bus company profits will increase, but what about us passengers?",
55
+ "I've noticed faster service since the new system launched.",
56
+ "Rural areas are being charged too much now.",
57
+ "The new system is making my daily commute more expensive.",
58
+ "Distance-based fares are the future of transportation.",
59
+ "I appreciate the transparency but the app needs work.",
60
+ "This discriminates against people living in rural areas!",
61
+ "My transportation costs have decreased by 30%!",
62
+ "We should go back to the old system immediately.",
63
+ "Kids going to school are now paying more, this is unfair.",
64
+ "The government did a good job explaining the benefits.",
65
+ "I've waited years for a fair pricing system like this.",
66
+ "Very impressed with the new fare calculation technology.",
67
+ "The app is too complicated for elderly passengers.",
68
+ "The transition period should have been longer.",
69
+ "I find the new fare calculator very intuitive.",
70
+ "This is just another way to extract more money from us.",
71
+ "Love how I can now predict exactly what my trip will cost.",
72
+ "The implementation was rushed without proper testing.",
73
+ "Prices vary too much depending on traffic congestion.",
74
+ "Works well in urban areas but rural commuters are suffering.",
75
+ "I've downloaded the fare calculator app - it's brilliant!",
76
+ "Taxi drivers are confused about calculating fares correctly."
77
+ ]
78
+
79
+ # Global variable to hold the current dataframe
80
+ global_df = None
81
+
82
+ # Function to generate default dataset from predefined comments
83
+
84
+ def generate_default_df():
85
+ global global_df
86
+ default_data = []
87
+ start_time = datetime.now() - timedelta(hours=24)
88
+
89
+ for i, comment in enumerate(comments):
90
+ timestamp = start_time + timedelta(hours=random.uniform(0, 24))
91
+
92
+ # Analyze sentiment
93
+ result = classifier(comment)[0]
94
+ sentiment = sentiment_map[result["label"]]
95
+ score = round(result["score"], 3)
96
+
97
+ # Extract keywords
98
+ try:
99
+ keywords = kw_model.extract_keywords(comment, top_n=3)
100
+ keyword_str = ", ".join([kw[0] for kw in keywords]) if keywords else "N/A"
101
+ except:
102
+ keyword_str = "N/A"
103
+
104
+ default_data.append({
105
+ "Datetime": timestamp,
106
+ "Text": comment,
107
+ "Sentiment": sentiment,
108
+ "Score": score,
109
+ "Keywords": keyword_str
110
+ })
111
+
112
+ default_df = pd.DataFrame(default_data)
113
+ default_df["Datetime"] = pd.to_datetime(default_df["Datetime"])
114
+ default_df["Datetime"] = default_df["Datetime"].dt.floor("1H")
115
+ global_df = default_df.sort_values("Datetime").reset_index(drop=True)
116
+ return global_df
117
+
118
+ # Function to process uploaded CSV or Excel file and analyze sentiment
119
+
120
+ def process_uploaded_file(file):
121
+ global global_df
122
+
123
+ if file is None:
124
+ global_df = generate_default_df()
125
+ return global_df
126
+
127
+ try:
128
+ # Read the uploaded file
129
+ if file.name.endswith('.csv'):
130
+ user_df = pd.read_csv(file.name)
131
+ elif file.name.endswith('.xlsx'):
132
+ user_df = pd.read_excel(file.name)
133
+ else:
134
+ raise ValueError("Unsupported file type. Please upload CSV or Excel files only.")
135
+
136
+ # Check required columns
137
+ if 'Text' not in user_df.columns:
138
+ raise ValueError("File must contain a 'Text' column with comments.")
139
+
140
+ # Handle datetime - create if not exists
141
+ if 'Datetime' not in user_df.columns:
142
+ # Generate timestamps for uploaded data
143
+ start_time = datetime.now() - timedelta(hours=len(user_df))
144
+ user_df['Datetime'] = [start_time + timedelta(hours=i) for i in range(len(user_df))]
145
+
146
+ # Clean and prepare data
147
+ user_df = user_df[['Datetime', 'Text']].copy()
148
+ user_df["Datetime"] = pd.to_datetime(user_df["Datetime"])
149
+ user_df["Datetime"] = user_df["Datetime"].dt.floor("1H")
150
+ user_df = user_df.dropna(subset=['Text'])
151
+
152
+ # Analyze sentiment and extract keywords for each comment
153
+ sentiments = []
154
+ scores = []
155
+ keywords_list = []
156
+
157
+ for text in user_df["Text"]:
158
+ try:
159
+ # Sentiment analysis
160
+ result = classifier(str(text))[0]
161
+ sentiment = sentiment_map[result['label']]
162
+ score = round(result['score'], 3)
163
+
164
+ # Keyword extraction
165
+ keywords = kw_model.extract_keywords(str(text), top_n=3)
166
+ keyword_str = ", ".join([kw[0] for kw in keywords]) if keywords else "N/A"
167
+
168
+ sentiments.append(sentiment)
169
+ scores.append(score)
170
+ keywords_list.append(keyword_str)
171
+ except Exception as e:
172
+ print(f"Error processing text: {e}")
173
+ sentiments.append("Neutral")
174
+ scores.append(0.5)
175
+ keywords_list.append("N/A")
176
+
177
+ user_df["Sentiment"] = sentiments
178
+ user_df["Score"] = scores
179
+ user_df["Keywords"] = keywords_list
180
+
181
+ global_df = user_df.sort_values("Datetime").reset_index(drop=True)
182
+ return global_df
183
+
184
+ except Exception as e:
185
+ print(f"Error processing file: {str(e)}")
186
+ global_df = generate_default_df()
187
+ return global_df
188
+
189
+ # Function to wrapper function for file analysis to update dataframe display
190
+
191
+ def get_analysis_dataframe(file):
192
+ return process_uploaded_file(file)
193
+
194
+ # Function to analyze a single comment and return sentiment and keywords
195
+
196
+ def analyze_text(comment):
197
+ if not comment or not comment.strip():
198
+ return "N/A", 0, "N/A"
199
+
200
+ try:
201
+ result = classifier(comment)[0]
202
+ sentiment = sentiment_map.get(result["label"], result["label"])
203
+ score = result["score"]
204
+
205
+ keywords = kw_model.extract_keywords(comment, top_n=3, keyphrase_ngram_range=(1, 2))
206
+ keywords_str = ", ".join([kw[0] for kw in keywords]) if keywords else "N/A"
207
+
208
+ return sentiment, score, keywords_str
209
+ except Exception as e:
210
+ print(f"Error analyzing text: {e}")
211
+ return "Error", 0, "Error processing text"
212
+
213
+ # Function to add analyzed comment to global dataframe
214
+
215
+ def add_to_dataframe(comment, sentiment, score, keywords):
216
+
217
+ global global_df
218
+ timestamp = datetime.now().replace(microsecond=0)
219
+
220
+ new_row = pd.DataFrame([{
221
+ "Datetime": timestamp,
222
+ "Text": comment,
223
+ "Sentiment": sentiment,
224
+ "Score": score,
225
+ "Keywords": keywords
226
+ }])
227
+
228
+ global_df = pd.concat([global_df, new_row], ignore_index=True)
229
+ return global_df
230
+
231
+ # Function to generate and display a simple word cloud based on sentiment filter
232
+
233
+ def create_wordcloud_simple(df, sentiment_filter=None):
234
+ if df is None or df.empty:
235
+ return None
236
+
237
+ # Filter by sentiment if provided
238
+ if sentiment_filter and sentiment_filter != "All":
239
+ filtered_df = df[df["Sentiment"] == sentiment_filter]
240
+ else:
241
+ filtered_df = df
242
+
243
+ if filtered_df.empty:
244
+ print("No data available for the selected sentiment.")
245
+ return None
246
+
247
+ # Combine keywords into a single string
248
+ keyword_text = filtered_df["Keywords"].fillna("").str.replace("N/A", "").str.replace(",", " ")
249
+ all_keywords = " ".join(keyword_text)
250
+
251
+ if not all_keywords.strip():
252
+ print("No valid keywords to display in word cloud.")
253
+ return None
254
+
255
+ # Select colormap based on sentiment
256
+ colormap = "viridis"
257
+ if sentiment_filter == "Positive":
258
+ colormap = "Greens"
259
+ elif sentiment_filter == "Neutral":
260
+ colormap = "Blues"
261
+ elif sentiment_filter == "Negative":
262
+ colormap = "Reds"
263
+
264
+ # Create the word cloud
265
+ wordcloud = WordCloud(
266
+ background_color='white',
267
+ colormap=colormap,
268
+ max_words=100,
269
+ width=800,
270
+ height=400
271
+ ).generate(all_keywords)
272
+
273
+ # Convert to image for Gradio
274
+ return wordcloud.to_image()
275
+
276
+ # Function to create a timeline visualization showing comment volume by sentiment over time
277
+
278
+ def plot_sentiment_timeline(df):
279
+ if df is None or df.empty:
280
+ return go.Figure().update_layout(title="No data available", height=400)
281
+
282
+ try:
283
+ # Process datetime
284
+ df_copy = df.copy()
285
+ df_copy["Datetime"] = pd.to_datetime(df_copy["Datetime"])
286
+ df_copy["Time_Bin"] = df_copy["Datetime"].dt.floor("1H")
287
+
288
+ # Group by time and sentiment
289
+ grouped = (
290
+ df_copy.groupby(["Time_Bin", "Sentiment"])
291
+ .agg(
292
+ Count=("Text", "count"),
293
+ Score=("Score", "mean"),
294
+ Keywords=("Keywords", lambda x: ", ".join(set(", ".join(x).split(", "))) if len(x) > 0 else "")
295
+ )
296
+ .reset_index()
297
+ )
298
+
299
+ # Create plot
300
+ fig = go.Figure()
301
+
302
+ # Add a line for each sentiment
303
+ for sentiment, color in color_map.items():
304
+ sentiment_df = grouped[grouped["Sentiment"] == sentiment]
305
+ if sentiment_df.empty:
306
+ continue
307
+
308
+ fig.add_trace(
309
+ go.Scatter(
310
+ x=sentiment_df["Time_Bin"],
311
+ y=sentiment_df["Count"],
312
+ mode='lines+markers',
313
+ name=sentiment,
314
+ line=dict(color=color, width=3),
315
+ marker=dict(size=6, color=color),
316
+ text=sentiment_df["Keywords"],
317
+ hovertemplate='<b>%{y} comments</b><br>%{x}<br><b>Keywords:</b> %{text}<extra></extra>'
318
+ )
319
+ )
320
+
321
+ # Layout updates
322
+ fig.update_layout(
323
+ title="Sentiment Analysis (1-Hour Intervals)",
324
+ height=500,
325
+ xaxis=dict(
326
+ title="Time",
327
+ tickformat="%Y-%m-%d %H:%M"
328
+ ),
329
+ yaxis_title="Number of Comments",
330
+ template="plotly_white"
331
+ )
332
+
333
+ return fig
334
+
335
+ except Exception as e:
336
+ print(f"Error in timeline plot: {e}")
337
+ return go.Figure().update_layout(
338
+ title="Error creating timeline visualization",
339
+ height=400
340
+ )
341
+
342
+ # Function to create a dual-view visualization of sentiment distribution
343
+
344
+ def plot_sentiment_distribution(df):
345
+ if df is None or df.empty:
346
+ return go.Figure().update_layout(title="No data available", height=400)
347
+
348
+ try:
349
+ # Group sentiment counts
350
+ sentiment_counts = df["Sentiment"].value_counts().reset_index()
351
+ sentiment_counts.columns = ["Sentiment", "Count"]
352
+ sentiment_counts["Percentage"] = sentiment_counts["Count"] / sentiment_counts["Count"].sum() * 100
353
+
354
+ # Create subplots
355
+ fig = make_subplots(
356
+ rows=1, cols=2,
357
+ specs=[[{"type": "domain"}, {"type": "xy"}]],
358
+ subplot_titles=("Sentiment Distribution", "Sentiment Counts"),
359
+ column_widths=[0.5, 0.5]
360
+ )
361
+
362
+ # Pie Chart
363
+ fig.add_trace(
364
+ go.Pie(
365
+ labels=sentiment_counts["Sentiment"],
366
+ values=sentiment_counts["Count"],
367
+ textinfo="percent+label",
368
+ marker=dict(colors=[color_map.get(s, "#999999") for s in sentiment_counts["Sentiment"]]),
369
+ hole=0.4
370
+ ),
371
+ row=1, col=1
372
+ )
373
+
374
+ # Bar Chart
375
+ fig.add_trace(
376
+ go.Bar(
377
+ x=sentiment_counts["Sentiment"],
378
+ y=sentiment_counts["Count"],
379
+ text=sentiment_counts["Count"],
380
+ textposition="auto",
381
+ marker_color=[color_map.get(s, "#999999") for s in sentiment_counts["Sentiment"]]
382
+ ),
383
+ row=1, col=2
384
+ )
385
+
386
+ # Update layout
387
+ fig.update_layout(
388
+ title="Sentiment Distribution Overview",
389
+ height=450,
390
+ template="plotly_white",
391
+ showlegend=False
392
+ )
393
+
394
+ return fig
395
+
396
+ except Exception as e:
397
+ print(f"Error in distribution plot: {e}")
398
+ return go.Figure().update_layout(
399
+ title="Error creating distribution visualization",
400
+ height=450
401
+ )
402
+
403
+ # Function to create a grouped bar chart visualization of the top keywords across sentiments
404
+
405
+ def plot_keyword_analysis(df):
406
+ if df is None or df.empty:
407
+ return go.Figure().update_layout(title="No data available", height=400)
408
+
409
+ try:
410
+ all_keywords = []
411
+
412
+ # Process each sentiment
413
+ for sentiment in ["Positive", "Neutral", "Negative"]:
414
+ sentiment_df = df[df["Sentiment"] == sentiment]
415
+ if sentiment_df.empty:
416
+ continue
417
+
418
+ # Extract and flatten keyword lists
419
+ for keywords_str in sentiment_df["Keywords"].dropna():
420
+ if keywords_str and keywords_str.upper() != "N/A":
421
+ keywords = [kw.strip() for kw in keywords_str.split(",") if kw.strip()]
422
+ for kw in keywords:
423
+ all_keywords.append((kw, sentiment))
424
+
425
+ if not all_keywords:
426
+ return go.Figure().update_layout(
427
+ title="No keyword data available",
428
+ height=500
429
+ )
430
+
431
+ # Create DataFrame and aggregate keyword counts
432
+ keywords_df = pd.DataFrame(all_keywords, columns=["Keyword", "Sentiment"])
433
+ keyword_counts = (
434
+ keywords_df.groupby(["Keyword", "Sentiment"])
435
+ .size()
436
+ .reset_index(name="Count")
437
+ )
438
+
439
+ # Filter top 15 keywords by overall frequency
440
+ top_keywords = keywords_df["Keyword"].value_counts().nlargest(15).index
441
+ keyword_counts = keyword_counts[keyword_counts["Keyword"].isin(top_keywords)]
442
+
443
+ # Plot grouped bar chart
444
+ fig = px.bar(
445
+ keyword_counts,
446
+ x="Keyword",
447
+ y="Count",
448
+ color="Sentiment",
449
+ color_discrete_map=color_map,
450
+ text="Count",
451
+ barmode="group",
452
+ labels={"Count": "Frequency", "Keyword": ""},
453
+ title="🔍 Top Keywords by Sentiment"
454
+ )
455
+
456
+ fig.update_layout(
457
+ legend_title="Sentiment",
458
+ xaxis=dict(categoryorder="total descending"),
459
+ yaxis=dict(title="Frequency"),
460
+ height=500,
461
+ template="plotly_white"
462
+ )
463
+
464
+ return fig
465
+
466
+ except Exception as e:
467
+ print(f"Error in keyword analysis: {e}")
468
+ return go.Figure().update_layout(
469
+ title="Error creating keyword visualization",
470
+ height=500
471
+ )
472
+
473
+ # Function to generate summary sentiment metrics for dashboard visualization
474
+
475
+ def create_summary_metrics(df):
476
+ if df is None or df.empty:
477
+ return {
478
+ "total": 0, "positive": 0, "neutral": 0, "negative": 0,
479
+ "positive_pct": 0.0, "neutral_pct": 0.0, "negative_pct": 0.0,
480
+ "sentiment_ratio": 0.0, "trend": "No data"
481
+ }
482
+
483
+ try:
484
+ total_comments = len(df)
485
+
486
+ # Count sentiments
487
+ sentiment_counts = df["Sentiment"].value_counts().to_dict()
488
+ positive = sentiment_counts.get("Positive", 0)
489
+ neutral = sentiment_counts.get("Neutral", 0)
490
+ negative = sentiment_counts.get("Negative", 0)
491
+
492
+ # Calculate percentages safely
493
+ def pct(count):
494
+ return round((count / total_comments) * 100, 1) if total_comments else 0.0
495
+
496
+ positive_pct = pct(positive)
497
+ neutral_pct = pct(neutral)
498
+ negative_pct = pct(negative)
499
+
500
+ # Sentiment ratio (Positive : Negative)
501
+ sentiment_ratio = round(positive / negative, 2) if negative > 0 else float('inf')
502
+
503
+ # Trend detection based on time-series sentiment evolution
504
+ trend = "Insufficient data"
505
+ if total_comments >= 5 and "Datetime" in df.columns:
506
+ sorted_df = df.sort_values("Datetime")
507
+ mid = total_comments // 2
508
+ first_half = sorted_df.iloc[:mid]
509
+ second_half = sorted_df.iloc[mid:]
510
+
511
+ # Compute positive sentiment proportion in both halves
512
+ first_pos_pct = (first_half["Sentiment"] == "Positive").mean()
513
+ second_pos_pct = (second_half["Sentiment"] == "Positive").mean()
514
+
515
+ delta = second_pos_pct - first_pos_pct
516
+ if delta > 0.05:
517
+ trend = "Improving"
518
+ elif delta < -0.05:
519
+ trend = "Declining"
520
+ else:
521
+ trend = "Stable"
522
+
523
+ return {
524
+ "total": total_comments,
525
+ "positive": positive,
526
+ "neutral": neutral,
527
+ "negative": negative,
528
+ "positive_pct": positive_pct,
529
+ "neutral_pct": neutral_pct,
530
+ "negative_pct": negative_pct,
531
+ "sentiment_ratio": sentiment_ratio,
532
+ "trend": trend,
533
+ }
534
+
535
+ except Exception as e:
536
+ print(f"Error in summary metrics: {e}")
537
+ return {
538
+ "total": 0, "positive": 0, "neutral": 0, "negative": 0,
539
+ "positive_pct": 0.0, "neutral_pct": 0.0, "negative_pct": 0.0,
540
+ "sentiment_ratio": 0.0, "trend": "Error calculating"
541
+ }
542
+
543
+ # Function to analyze a single comment for the Quick Analyzer tab
544
+
545
+ def gradio_analyze_comment(comment):
546
+
547
+ try:
548
+ if not comment or not comment.strip():
549
+ return "N/A", "0.0%", "N/A"
550
+
551
+ sentiment, score, keywords = analyze_text(comment)
552
+ score_str = f"{score * 100:.1f}%"
553
+
554
+ return sentiment, score_str, keywords
555
+
556
+ except Exception as e:
557
+ print(f"Error in gradio_analyze_comment: {e}")
558
+ return "Error", "0.0%", "Error processing comment"
559
+
560
+ # Function to add a comment to the dashboard
561
+
562
+ def gradio_add_comment(comment):
563
+ global global_df
564
+
565
+ if not comment or not comment.strip():
566
+ return global_df, "Please enter a comment", "", plot_sentiment_timeline(global_df), plot_sentiment_distribution(global_df), plot_keyword_analysis(global_df)
567
+
568
+ sentiment, score, keywords = analyze_text(comment)
569
+ updated_df = add_to_dataframe(comment, sentiment, score, keywords)
570
+
571
+ # Generate feedback message
572
+ feedback = f"✓ Added: {sentiment} comment (Confidence: {score*100:.1f}%)"
573
+
574
+ # Update all visualizations
575
+ timeline_plot = plot_sentiment_timeline(updated_df)
576
+ distribution_plot = plot_sentiment_distribution(updated_df)
577
+ keyword_plot = plot_keyword_analysis(updated_df)
578
+
579
+ return updated_df, feedback, "", timeline_plot, distribution_plot, keyword_plot
580
+
581
+ # Function to generate a word cloud image from the DataFrame
582
+
583
+ def gradio_generate_wordcloud(sentiment_filter):
584
+ try:
585
+ filter_value = sentiment_filter if sentiment_filter != "All" else None
586
+ return create_wordcloud_simple(global_df, filter_value)
587
+ except Exception as e:
588
+ print(f"Error generating word cloud: {e}")
589
+ return None
590
+
591
+
592
+
593
+ # Function to export the current dataframe to CSV for download
594
+
595
+ def export_data_to_csv(df_component):
596
+ global global_df
597
+ try:
598
+ if global_df is not None and not global_df.empty:
599
+ csv_buffer = io.StringIO()
600
+ global_df.to_csv(csv_buffer, index=False)
601
+ csv_content = csv_buffer.getvalue()
602
+
603
+ # Save to a temporary file
604
+ filename = f"sentiment_analysis_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
605
+ with open(filename, 'w', encoding='utf-8') as f:
606
+ f.write(csv_content)
607
+
608
+ return filename
609
+ else:
610
+ return None
611
+ except Exception as e:
612
+ print(f"Error exporting data: {e}")
613
+ return None
614
+
615
+
616
+
617
+ # Initialize the global dataframe with default data
618
+
619
+ global_df = generate_default_df()
620
+
621
+ # Function to create an updated function that returns all necessary components when file is loaded
622
+
623
+ def load_and_update_all_components(file):
624
+ global global_df
625
+
626
+ if file is None:
627
+ # Return current state if no file
628
+ metrics = create_summary_metrics(global_df)
629
+ return (
630
+ global_df,
631
+ metrics["total"], metrics["positive_pct"], metrics["neutral_pct"],
632
+ metrics["negative_pct"], metrics["sentiment_ratio"], metrics["trend"],
633
+ plot_sentiment_timeline(global_df), plot_sentiment_distribution(global_df),
634
+ plot_keyword_analysis(global_df), global_df
635
+ )
636
+
637
+ # Load and analyze the uploaded file
638
+ updated_df = get_analysis_dataframe(file)
639
+ metrics = create_summary_metrics(updated_df)
640
+
641
+ # Update global dataframe
642
+ global_df = updated_df
643
+
644
+ return (
645
+ updated_df,
646
+ metrics["total"], metrics["positive_pct"], metrics["neutral_pct"],
647
+ metrics["negative_pct"], metrics["sentiment_ratio"], metrics["trend"],
648
+ plot_sentiment_timeline(updated_df), plot_sentiment_distribution(updated_df),
649
+ plot_keyword_analysis(updated_df), updated_df
650
+ )
651
+
652
+ # Create the Gradio interface and dashboard
653
+
654
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
655
+ gr.Markdown(
656
+ """
657
+ # Distance-Based Fare Sentiment Analysis Dashboard
658
+ ## Analysis of public feedback with advanced visualizations
659
+ """
660
+ )
661
+
662
+ with gr.Row():
663
+ file_input = gr.File(label="📁 Upload CSV or Excel File", file_types=[".csv", ".xlsx"])
664
+ load_btn = gr.Button("Load & Analyze File")
665
+
666
+ # Create a DataFrame component to display and update with data
667
+ comments_df = gr.DataFrame(value=global_df, label="Comment Data", interactive=False, visible=False)
668
+
669
+ with gr.Tabs():
670
+ # Tab 1: Main Dashboard
671
+ with gr.Tab("Analytics Dashboard"):
672
+ # Summary metrics
673
+ metrics = create_summary_metrics(global_df)
674
+
675
+ with gr.Row():
676
+ with gr.Column(scale=1):
677
+ total_comments = gr.Number(value=metrics["total"], label="Total Comments", interactive=False)
678
+ with gr.Column(scale=1):
679
+ positive_count = gr.Number(value=metrics["positive_pct"], label="Positive %", interactive=False)
680
+ with gr.Column(scale=1):
681
+ neutral_count = gr.Number(value=metrics["neutral_pct"], label="Neutral %", interactive=False)
682
+ with gr.Column(scale=1):
683
+ negative_count = gr.Number(value=metrics["negative_pct"], label="Negative %", interactive=False)
684
+
685
+ with gr.Row():
686
+ with gr.Column(scale=1):
687
+ pos_neg_ratio = gr.Number(value=metrics["sentiment_ratio"], label="Positive/Negative Ratio", interactive=False)
688
+ with gr.Column(scale=1):
689
+ sentiment_trend = gr.Textbox(value=metrics["trend"], label="entiment Trend", interactive=False)
690
+
691
+ feedback_text = gr.Textbox(label="", interactive=False, visible=True)
692
+
693
+
694
+ gr.Markdown("### Sentiment Visualizations")
695
+
696
+ with gr.Tabs():
697
+ with gr.Tab("Timeline Analysis"):
698
+ timeline_plot = gr.Plot(value=plot_sentiment_timeline(global_df))
699
+
700
+ with gr.Tab("Sentiment Distribution"):
701
+ distribution_plot = gr.Plot(value=plot_sentiment_distribution(global_df))
702
+
703
+ with gr.Tab("Keyword Analysis"):
704
+ keyword_plot = gr.Plot(value=plot_keyword_analysis(global_df))
705
+
706
+ with gr.Tab("Word Clouds"):
707
+ with gr.Row():
708
+ sentiment_filter = gr.Dropdown(
709
+ choices=["All", "Positive", "Neutral", "Negative"],
710
+ value="All",
711
+ label="Sentiment Filter"
712
+ )
713
+ generate_button = gr.Button("Generate Word Cloud")
714
+
715
+ wordcloud_output = gr.Image(label="Word Cloud")
716
+
717
+ generate_button.click(
718
+ fn=gradio_generate_wordcloud,
719
+ inputs=sentiment_filter,
720
+ outputs=wordcloud_output
721
+ )
722
+
723
+ gr.Markdown("### Comment Data")
724
+ with gr.Row():
725
+ comments_display = gr.DataFrame(value=global_df, label="Comment Data", interactive=False)
726
+
727
+ with gr.Row():
728
+ export_btn = gr.Button("Export & Download CSV", variant="secondary")
729
+ download_component = gr.File(label="Download", visible=True)
730
+
731
+ # Connect the export button to the download function
732
+ export_btn.click(
733
+ fn=export_data_to_csv,
734
+ inputs=[comments_display],
735
+ outputs=[download_component]
736
+ )
737
+
738
+ # Connect the load button to update ALL components
739
+ load_btn.click(
740
+ fn=load_and_update_all_components,
741
+ inputs=[file_input],
742
+ outputs=[
743
+ comments_df, # Hidden state component
744
+ total_comments, positive_count, neutral_count, negative_count, # Metric displays
745
+ pos_neg_ratio, sentiment_trend, # Additional metrics
746
+ timeline_plot, distribution_plot, keyword_plot, # Visualizations
747
+ comments_display # Comments table
748
+ ]
749
+ )
750
+
751
+ # Set up event handlers for adding comments (using global_df)
752
+ def gradio_add_comment_updated(comment):
753
+ global global_df
754
+ global_df, feedback, _ = gradio_add_comment(comment)
755
+
756
+ # Return all updated components
757
+ return (
758
+ global_df, feedback, "", # Updated df, feedback, clear input
759
+ plot_sentiment_timeline(global_df),
760
+ plot_sentiment_distribution(global_df),
761
+ plot_keyword_analysis(global_df),
762
+ global_df # Update the display table too
763
+ )
764
+
765
+ # Tab 2: Quick Analysis
766
+ with gr.Tab("Quick Sentiment Analyzer"):
767
+ gr.Markdown("""
768
+ ### Quick Sentiment Analysis Tool
769
+ Enter any comment about the distance-based fare system to get instant sentiment analysis
770
+ """)
771
+
772
+ with gr.Row():
773
+ quick_comment = gr.Textbox(
774
+ placeholder="Type your comment here...",
775
+ label="Comment for Analysis",
776
+ lines=3
777
+ )
778
+
779
+ with gr.Row():
780
+ analyze_btn = gr.Button("Analyze Sentiment", variant="primary")
781
+
782
+ with gr.Row():
783
+ with gr.Column():
784
+ sentiment_result = gr.Textbox(label="Sentiment")
785
+ with gr.Column():
786
+ confidence_result = gr.Textbox(label="Confidence")
787
+ with gr.Column():
788
+ keyword_result = gr.Textbox(label="Key Topics")
789
+
790
+ analyze_btn.click(
791
+ fn=gradio_analyze_comment,
792
+ inputs=quick_comment,
793
+ outputs=[sentiment_result, confidence_result, keyword_result]
794
+ )
795
+
796
+ # Tab 3: About & Help
797
+ with gr.Tab("About this Dashboard"):
798
+ gr.Markdown("""
799
+ ## About This Dashboard
800
+
801
+ This dashboard provides analysis of perception of individual about Distance-Based Fare system.
802
+ It analyzes public comments collected from different social media and identify key concerns.
803
+
804
+ ### Features:
805
+
806
+ - **Sentiment Analysis**: Automatically classifies comments as Positive, Neutral, or Negative
807
+ - **Keyword Extraction**: Identifies the most important key words in each comment
808
+ - **Time Series Analysis**: Tracks sentiment trends over time
809
+ - **Word Cloud Visualization**: Visual representation of the most common Key words
810
+ - **Data Export**: Download collected data for further analysis
811
+
812
+ ### How to Use:
813
+
814
+ 1. Use the main dashboard to view overall sentiment metrics and trends
815
+ 2. Add new comments via the comment input box
816
+ 3. Use the Quick Analyzer for testing sentiment on individual comments
817
+ 4. Upload your own data files (CSV/Excel) to analyze custom datasets
818
+ 5. Export data in CSV format for external analysis
819
+
820
+ ### File Upload Requirements:
821
+
822
+ - CSV or Excel files (.csv, .xlsx)
823
+ - Must contain a 'Text' column with comments
824
+ - Optional 'Datetime' column (will be auto-generated if missing)
825
+
826
+
827
+ This dashboard is developed by Anaclet UKURIKIYEYEZU, contact: 0786698014
828
+ """)
829
+
830
+ # Launch the app
831
+ if __name__ == "__main__":
832
+ demo.launch(share=True)
833
+