A newer version of the Gradio SDK is available:
6.0.1
title: Multimodal Misinfo Detector
emoji: ⚡
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Detects misinformation from text + real/fake from images
license: mit
Multimodal Misinformation Detector
This project is a Gradio-based web application that detects misinformation from both text and images. It combines Natural Language Processing (NLP) and Computer Vision (CV) models to determine whether content is real or fake, and provides an LLM-generated explanation for textual claims using the Hugging Face Inference API.
Overview
The application consists of two main components:
- Text Detector – Classifies textual claims as Real or Fake by:
- Collecting evidence from Google Fact Check API, Tavily API, and Wikipedia.
- Ranking evidence sentences using semantic similarity with SentenceTransformer.
- Feeding the claim and evidence into a fine-tuned DeBERTa model.
- Using an LLM (Llama 3.1 8B Instruct) to generate a structured explanation.
- Image Detector – Classifies uploaded images as Real or Fake using a fine-tuned CLIP-based binary classifier.
Required API Keys
To use the Text Detector, you must enter the following keys in the Gradio interface:
| Key | Description |
|---|---|
| Hugging Face Token | Used for the Llama 3.1 Inference API (for explanations). |
| Tavily API Key | Used to fetch evidence sentences from Tavily. |
| Google Fact Check API Key | Used to retrieve verified fact-checks from Google. |
The "Classify Claim" button remains disabled until all three keys are provided.
Models Used
| Component | Model | Description |
|---|---|---|
| Text Classifier | rajyalakshmijampani/fever_finetuned_deberta | Fine-tuned DeBERTa model for FEVER-style fact verification. |
| LLM (Explanation) | meta-llama/Llama-3.1-8B-Instruct | Generates structured JSON explanations through Inference API. |
| Embedding Model | sentence-transformers/all-MiniLM-L6-v2 | Used to rank retrieved evidence sentences. |
| Image Classifier | rajyalakshmijampani/finetuned_clip | Fine-tuned CLIP-based binary classifier for image authenticity. |
| Vision Encoder | openai/clip-vit-base-patch32 | Base visual encoder for feature extraction in the image classifier. |
Text Detection Flow
- The user enters a textual claim and provides the three API keys.
- The system retrieves relevant evidence sentences using Google Fact Check API, Tavily API, Wikipedia search.
- Evidence sentences are ranked by semantic similarity with the claim.
- The top sentences and claim are combined and passed to the fine-tuned DeBERTa classifier.
- The classifier outputs a label (REAL or FAKE).
- The result and evidence are sent to the Llama 3.1 model for generating Verdict (Real / Fake / Uncertain), Explanation (3–5 sentences), Confidence (Low / Medium / High).
- The formatted explanation is displayed in the Gradio interface.
Example Input:
"NASA confirms the Sun rose from the West today!"
Example Output:
Prediction: The claim is Fake.
Explanation: The statement contradicts verified astronomical facts. The Sun always rises in the East due to Earth's rotation.
Confidence: High.
Image Detection Flow
- The user uploads an image in standard format (JPG, PNG, etc.).
- The image is processed by the CLIP vision encoder.
- The fine-tuned classifier predicts a probability between 0 and 1.
- Output: "Fake" if probability > 0.5, "Real" otherwise.
Input/Output Summary
| Detector | Input | Output |
|---|---|---|
| Text Detector | Text claim + 3 API keys | Prediction (Real/Fake/Uncertain), Explanation, Confidence |
| Image Detector | Uploaded image | Prediction (Real/Fake), Confidence score |
Authored by:
Group 9 - DSAI Lab Project
License
This project is licensed under the MIT License. You are free to use and modify it with proper attribution.