--- title: Ltu Chat emoji: šŸ¢ colorFrom: pink colorTo: red sdk: streamlit sdk_version: 1.43.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # LTU Chat RAG Evaluation This repository contains a RAG (Retrieval-Augmented Generation) pipeline for the LTU (LuleĆ„ University of Technology) programme data, along with evaluation tools using Ragas. ## Overview The system uses: - **Qdrant**: Vector database for storing and retrieving embeddings - **Haystack**: Framework for building the RAG pipeline - **Ragas**: Framework for evaluating RAG systems ## Files - `rag_pipeline.py`: Main RAG pipeline implementation - `ragas_eval.py`: Script to evaluate the RAG pipeline using Ragas - `testset.json`: JSONL file containing test questions, reference answers, and contexts - `testset_generation.py`: Script used to generate the test set ## Requirements ``` streamlit==1.42.2 haystack-ai==2.10.3 qdrant-client==1.13.2 python-dotenv==1.0.1 beautifulsoup4==4.13.3 qdrant-haystack==8.0.0 ragas-haystack==2.1.0 rapidfuzz==3.12.2 pandas ``` ## Setup 1. Make sure you have all the required packages installed: ``` pip install -r requirements.txt ``` 2. Set up your environment variables (optional): ``` export NEBIUS_API_KEY="your_api_key_here" ``` If not set, the script will use the default API key included in the code. ## Running the Evaluation To evaluate the RAG pipeline using Ragas: ```bash python ragas_eval.py ``` This will: 1. Load the Qdrant document store from the local directory 2. Load the test set from `testset.json` 3. Run the RAG pipeline on each test question 4. Evaluate the results using Ragas metrics 5. Save the evaluation results to `ragas_evaluation_results.json` ## Ragas Metrics The evaluation uses the following Ragas metrics: - **Faithfulness**: Measures if the generated answer is factually consistent with the retrieved contexts - **Answer Relevancy**: Measures if the answer is relevant to the question - **Context Precision**: Measures the proportion of retrieved contexts that are relevant - **Context Recall**: Measures if the retrieved contexts contain the information needed to answer the question - **Context Relevancy**: Measures the relevance of retrieved contexts to the question ## Customization You can customize the evaluation by modifying the `RAGEvaluator` class parameters: ```python evaluator = RAGEvaluator( embedding_model_name="BAAI/bge-en-icl", llm_model_name="meta-llama/Llama-3.3-70B-Instruct", qdrant_path="./qdrant_data", api_base_url="https://api.studio.nebius.com/v1/", collection_name="ltu_programmes" ) ``` ## Test Set Format The test set is a JSONL file where each line contains: - `user_input`: The question - `reference`: The reference answer - `reference_contexts`: List of reference contexts that should be retrieved - `synthesizer_name`: Name of the synthesizer used to generate the reference answer