Spaces:
Running
Running
| title: Mistral | |
| emoji: ⚡ | |
| colorFrom: green | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 4.36.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # AI-powered Web Search and PDF Chat Assistant | |
| This application is a versatile AI-powered assistant that combines web search capabilities with PDF document analysis. It provides an interactive chat interface for users to ask questions, search the web, and analyze uploaded PDF documents. | |
| ## Features | |
| - Web search functionality | |
| - PDF document upload and analysis | |
| - Chat interface for asking questions | |
| - Multiple language models support (including Mistral, Mixtral, and Llama) | |
| - Temperature and API call adjustments for fine-tuned responses | |
| - Document management (upload, delete, refresh) | |
| - Entity-specific summary generation | |
| ## Requirements | |
| - Python 3.7+ | |
| - Gradio | |
| - Hugging Face Transformers | |
| - FAISS | |
| - DuckDuckGo Search | |
| - LangChain | |
| - Llama Parse | |
| - Pydantic | |
| ## Installation | |
| 1. Clone the repository | |
| 2. Install the required dependencies: | |
| # AI-powered Web Search and PDF Chat Assistant | |
| This project combines the power of large language models with web search capabilities and PDF document analysis to create a versatile chat assistant. Users can interact with their uploaded PDF documents or leverage web search to get informative responses to their queries. | |
| ## Features | |
| - **PDF Document Chat**: Upload and interact with multiple PDF documents. | |
| - **Web Search Integration**: Option to use web search for answering queries. | |
| - **Multiple AI Models**: Choose from a selection of powerful language models. | |
| - **Customizable Responses**: Adjust temperature and API call settings for fine-tuned outputs. | |
| - **User-friendly Interface**: Built with Gradio for an intuitive chat experience. | |
| - **Document Selection**: Choose which uploaded documents to include in your queries. | |
| ## How It Works | |
| 1. **Document Processing**: | |
| - Upload PDF documents using either PyPDF or LlamaParse. | |
| - Documents are processed and stored in a FAISS vector database for efficient retrieval. | |
| 2. **Embedding**: | |
| - Utilizes HuggingFace embeddings (default: 'sentence-transformers/all-mpnet-base-v2') for document indexing and query matching. | |
| 3. **Query Processing**: | |
| - For PDF queries, relevant document sections are retrieved from the FAISS database. | |
| - For web searches, results are fetched using the DuckDuckGo search API. | |
| 4. **Response Generation**: | |
| - Queries are processed using the selected AI model (options include Mistral, Mixtral, and others). | |
| - Responses are generated based on the retrieved context (from PDFs or web search). | |
| 5. **User Interaction**: | |
| - Users can chat with the AI, asking questions about uploaded documents or general queries. | |
| - The interface allows for adjusting model parameters and switching between PDF and web search modes. | |
| ## Setup and Usage | |
| 1. Install the required dependencies (list of dependencies to be added). | |
| 2. Set up the necessary API keys and tokens in your environment variables. | |
| 3. Run the main script to launch the Gradio interface. | |
| 4. Upload PDF documents using the file input at the top of the interface. | |
| 5. Select documents to query using the checkboxes. | |
| 6. Toggle between PDF chat and web search modes as needed. | |
| 7. Adjust temperature and number of API calls to fine-tune responses. | |
| 8. Start chatting and asking questions! | |
| ## Models | |
| The project supports multiple AI models, including: | |
| - mistralai/Mistral-7B-Instruct-v0.3 | |
| - mistralai/Mixtral-8x7B-Instruct-v0.1 | |
| - meta/llama-3.1-8b-instruct | |
| - mistralai/Mistral-Nemo-Instruct-2407 | |
| ## Future Improvements | |
| - Integration of more embedding models for improved performance. | |
| - Enhanced PDF parsing capabilities. | |
| - Support for additional file formats beyond PDF. | |
| - Improved caching for faster response times. | |
| ## Contribution | |
| Contributions to this project are welcome! Please feel free to submit issues or pull requests on the project's GitHub repository. | |