import numpy # Ensure NumPy is loaded first to avoid FAISS issues import faiss # Load FAISS after NumPy import os import streamlit as st import pandas as pd import pdfplumber from sentence_transformers import SentenceTransformer from groq import Groq import numpy as np # API key for Groq API_KEY = "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" client = Groq(api_key=API_KEY) # Initialize the embedding model embed_model = SentenceTransformer('all-MiniLM-L6-v2') # Function to extract text from PDF def extract_text_from_pdf(pdf_file): with pdfplumber.open(pdf_file) as pdf: return ' '.join(page.extract_text() for page in pdf.pages) # Function to create embeddings and store them in FAISS def create_embeddings(text): chunks = [text[i:i+500] for i in range(0, len(text), 500)] embeddings = embed_model.encode(chunks) index = faiss.IndexFlatL2(embeddings.shape[1]) index.add(embeddings) return chunks, embeddings, index # Function to find the most relevant chunk for the user's question def get_relevant_chunk(question, embeddings, index, chunks): question_embedding = embed_model.encode([question]) D, I = index.search(np.array(question_embedding).astype(np.float32), 1) # Retrieve top 1 chunk relevant_chunk = chunks[I[0][0]] # The chunk corresponding to the closest embedding return relevant_chunk # Function to get the model's response from Groq API def get_answer_from_groq(question, context): chat_completion = client.chat.completions.create( messages=[ {"role": "user", "content": f"Answer the following question based on the context:\nContext: {context}\nQuestion: {question}"} ], model="llama3-8b-8192", ) return chat_completion.choices[0].message.content # Streamlit app def main(): st.set_page_config( page_title="RAG Based Application", page_icon="📄", layout="centered", ) # Custom CSS for styling st.markdown( """ """, unsafe_allow_html=True, ) # App title and description st.markdown('