Spaces:

sanjaystarc
/

data_analyst_pro

Running

App Files Files Community

sanjaystarc commited on Mar 6

Commit

70f37b4

verified ·

1 Parent(s): 6496b12

Upload 5 files

Browse files

Files changed (5) hide show

README.md +156 -10
app.py +472 -0
core_agent.py +318 -0
requirements.txt +14 -0
sample_data.csv +31 -0

README.md CHANGED Viewed

@@ -1,13 +1,159 @@
 ---
-title: Data Analyst Pro
-emoji: 🏢
-colorFrom: purple
-colorTo: green
-sdk: gradio
-sdk_version: 6.8.0
-app_file: app.py
-pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🧠 DataMind Agent
+### AI-Powered Data Analyst — LangChain + Gemini + Streamlit
+Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.
+---
+## 🚀 Features
+| Feature | Description |
+|---|---|
+| 📂 Multi-format support | CSV, Excel (.xlsx/.xls), JSON |
+| 💬 Natural language Q&A | Ask anything, get intelligent answers |
+| 📊 Auto visualizations | AI picks the best chart for your question |
+| 🎨 Custom chart builder | Build any chart with dropdown controls |
+| 🔍 Data explorer | Filter, search, and download raw data |
+| 🧠 AI data summary | Executive summary generated by Gemini |
 ---
+## 📁 Project Structure
+```
+data-analyst-agent/
+├── app.py              # Streamlit UI (main app)
+├── core_agent.py       # LangChain + Gemini logic
+├── requirements.txt    # Python dependencies
+├── .env                # API key config
+├── sample_data.csv     # Test dataset (sales data)
+└── README.md           # This file
+```
+---
+## ⚙️ Setup & Installation
+### Step 1 — Clone / download the project
+```bash
+cd data-analyst-agent
+```
+### Step 2 — Create a virtual environment (recommended)
+```bash
+python -m venv venv
+# On Windows:
+venv\Scripts\activate
+# On Mac/Linux:
+source venv/bin/activate
+```
+### Step 3 — Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### Step 4 — Get your free Gemini API key
+1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
+2. Sign in with Google
+3. Click **"Create API Key"**
+4. Copy the key (starts with `AIza...`)
+### Step 5 — Add your API key
+Either paste it directly in the app sidebar, OR add it to `.env`:
+```
+GOOGLE_API_KEY=AIzaYourKeyHere
+```
+### Step 6 — Run the app
+```bash
+streamlit run app.py
+```
+The app opens at **http://localhost:8501**
+---
+## 🎯 How to Use
+1. **Paste your Gemini API key** in the sidebar
+2. **Upload a data file** (CSV, Excel, or JSON)
+3. **Dashboard tab** — see auto-generated stats and charts
+4. **Chat tab** — ask questions like:
+   - *"What are the top selling products?"*
+   - *"Is there a correlation between age and spending?"*
+   - *"Show me outliers in the sales column"*
+5. **Charts tab** — build custom visualizations
+6. **Raw Data tab** — filter and download your data
+---
+## 💡 Example Questions to Ask
+```
+"What is the average profit by category?"
+"Which region has the highest sales?"
+"Are there any missing values I should worry about?"
+"What trends do you see in the data over time?"
+"Which customers are the most valuable?"
+"Give me a statistical summary of all numeric columns"
+"What correlations exist between the columns?"
+```
+---
+## 🏗️ Architecture
+```
+User (Streamlit UI)
+       │
+       ▼
+  app.py (UI Layer)
+       │
+       ├── core_agent.py
+       │       ├── load_file()          → Parses CSV/Excel/JSON → DataFrame
+       │       ├── profile_dataframe()  → Statistical profiling
+       │       ├── ask_agent()          → LangChain → Gemini → Answer
+       │       ├── make_plotly_chart()  → Renders visualizations
+       │       └── ai_recommend_chart() → Gemini picks best chart
+       │
+       └── Google Gemini 1.5 Flash (via LangChain)
+```
+---
+## 📦 Key Libraries Used
+| Library | Purpose |
+|---|---|
+| `langchain` | Agent framework, prompt management |
+| `langchain-google-genai` | Gemini LLM integration |
+| `streamlit` | Web UI |
+| `pandas` | Data loading and manipulation |
+| `plotly` | Interactive visualizations |
+| `openpyxl` / `xlrd` | Excel file support |
+---
+## 🔧 Customization Ideas
+- Add **PDF support** using `pdfplumber`
+- Add **database connection** (SQLite, PostgreSQL)
+- Add **export to PowerPoint** for chart reports
+- Add **multi-file comparison** mode
+- Deploy to **Streamlit Cloud** (free hosting)
+---
+## 🆓 Free Tier Limits (Gemini 1.5 Flash)
+- 15 requests per minute
+- 1 million tokens per minute
+- 1,500 requests per day
+This is more than enough for personal data analysis projects!
 ---
+*Built with ❤️ using LangChain + Google Gemini + Streamlit*

app.py ADDED Viewed

	@@ -0,0 +1,472 @@

+"""
+app.py
+======
+Streamlit UI — Data Analyst Agent (LangChain + Gemini)
+Run: streamlit run app.py
+"""
+import os
+import io
+import streamlit as st
+import pandas as pd
+import plotly.express as px
+from core_agent import (
+    get_llm, load_file, profile_dataframe, profile_to_text,
+    ask_agent, auto_suggest_charts, make_plotly_chart, ai_recommend_chart
+)
+# ─── Page Config ──────────────────────────────────────────────────────────────
+st.set_page_config(
+    page_title="DataMind Agent",
+    page_icon="🧠",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+# ─── Custom CSS ───────────────────────────────────────────────────────────────
+st.markdown("""
+<style>
+@import url('https://fonts.googleapis.com/css2?family=Syne:wght@400;700;800&family=DM+Sans:wght@300;400;500&display=swap');
+html, body, [class*="css"] {
+    font-family: 'DM Sans', sans-serif;
+    background-color: #0a0a12;
+    color: #e8e8ff;
+}
+.main { background-color: #0a0a12; }
+/* Header */
+.hero-title {
+    font-family: 'Syne', sans-serif;
+    font-size: 2.8rem;
+    font-weight: 800;
+    background: linear-gradient(135deg, #e8e8ff 0%, #6C63FF 50%, #43E97B 100%);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    background-clip: text;
+    margin-bottom: 0.2rem;
+}
+.hero-sub {
+    color: #6a6a9a;
+    font-size: 1rem;
+    margin-bottom: 2rem;
+}
+/* Cards */
+.stat-card {
+    background: #1a1a2e;
+    border: 1px solid #2a2a45;
+    border-radius: 16px;
+    padding: 1.2rem 1.5rem;
+    text-align: center;
+}
+.stat-num {
+    font-family: 'Syne', sans-serif;
+    font-size: 2rem;
+    font-weight: 800;
+    color: #6C63FF;
+}
+.stat-label { color: #6a6a9a; font-size: 0.8rem; text-transform: uppercase; letter-spacing: 0.1em; }
+/* Chat bubbles */
+.user-bubble {
+    background: rgba(108,99,255,0.15);
+    border: 1px solid rgba(108,99,255,0.3);
+    border-radius: 18px 18px 4px 18px;
+    padding: 0.9rem 1.2rem;
+    margin: 0.5rem 0;
+    font-size: 0.95rem;
+}
+.agent-bubble {
+    background: #1a1a2e;
+    border: 1px solid #2a2a45;
+    border-radius: 18px 18px 18px 4px;
+    padding: 0.9rem 1.2rem;
+    margin: 0.5rem 0;
+    font-size: 0.95rem;
+    line-height: 1.6;
+}
+/* Sidebar */
+section[data-testid="stSidebar"] {
+    background: #10101e;
+    border-right: 1px solid #2a2a45;
+}
+/* Buttons */
+.stButton > button {
+    background: linear-gradient(135deg, #6C63FF, #43E97B);
+    color: white;
+    border: none;
+    border-radius: 12px;
+    font-family: 'Syne', sans-serif;
+    font-weight: 700;
+    padding: 0.6rem 1.5rem;
+    transition: opacity 0.2s;
+}
+.stButton > button:hover { opacity: 0.85; color: white; }
+.stTextInput > div > div > input {
+    background: #1a1a2e;
+    border: 1px solid #2a2a45;
+    border-radius: 12px;
+    color: #e8e8ff;
+}
+.stSelectbox > div > div {
+    background: #1a1a2e;
+    border: 1px solid #2a2a45;
+    border-radius: 12px;
+}
+/* Tabs */
+.stTabs [data-baseweb="tab-list"] {
+    background: #10101e;
+    border-radius: 12px;
+    gap: 0.3rem;
+}
+.stTabs [data-baseweb="tab"] {
+    background: transparent;
+    color: #6a6a9a;
+    border-radius: 10px;
+    font-family: 'Syne', sans-serif;
+}
+.stTabs [aria-selected="true"] {
+    background: rgba(108,99,255,0.2) !important;
+    color: #6C63FF !important;
+}
+</style>
+""", unsafe_allow_html=True)
+# ─── Session State ────────────────────────────────────────────────────────────
+for key, default in {
+    "df": None,
+    "profile": None,
+    "file_type": None,
+    "chat_history": [],
+    "llm": None,
+    "api_key_set": False,
+}.items():
+    if key not in st.session_state:
+        st.session_state[key] = default
+# ─── Sidebar ──────────────────────────────────────────────────────────────────
+with st.sidebar:
+    st.markdown("### 🧠 DataMind Agent")
+    st.markdown("---")
+    # API Key
+    st.markdown("**🔑 Gemini API Key**")
+    api_key = st.text_input(
+        "Enter your key", type="password",
+        placeholder="AIza...",
+        help="Get free key at aistudio.google.com",
+        label_visibility="collapsed"
+    )
+    if api_key:
+        if not st.session_state.api_key_set or st.session_state.get("_last_key") != api_key:
+            try:
+                st.session_state.llm = get_llm(api_key)
+                st.session_state.api_key_set = True
+                st.session_state["_last_key"] = api_key
+                st.success("✅ Connected to Gemini!")
+            except Exception as e:
+                st.error(f"❌ Invalid key: {e}")
+    st.markdown("---")
+    # File Upload
+    st.markdown("**📁 Upload Data File**")
+    uploaded = st.file_uploader(
+        "Upload", type=["csv", "xlsx", "xls", "json"],
+        label_visibility="collapsed"
+    )
+    if uploaded and st.session_state.api_key_set:
+        with st.spinner("📊 Analyzing your data..."):
+            try:
+                df, ftype = load_file(uploaded)
+                st.session_state.df = df
+                st.session_state.file_type = ftype
+                st.session_state.profile = profile_dataframe(df)
+                st.session_state.chat_history = []
+                st.success(f"✅ Loaded {ftype} file!")
+            except Exception as e:
+                st.error(f"❌ Error: {e}")
+    elif uploaded and not st.session_state.api_key_set:
+        st.warning("⚠️ Enter your Gemini API key first")
+    st.markdown("---")
+    st.markdown("""
+**How to use:**
+1. Paste your Gemini API key above
+2. Upload CSV, Excel, or JSON file
+3. Explore the Dashboard tab
+4. Ask questions in Chat tab
+5. Generate visuals in Charts tab
+---
+**Get free Gemini API key:**
+[aistudio.google.com](https://aistudio.google.com/app/apikey)
+""")
+# ─── Main Content ─────────────────────────────────────────────────────────────
+st.markdown('<div class="hero-title">🧠 DataMind Agent</div>', unsafe_allow_html=True)
+st.markdown('<div class="hero-sub">AI-powered data analysis using LangChain + Gemini · Upload any data file and start exploring</div>', unsafe_allow_html=True)
+if st.session_state.df is None:
+    # Landing state
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        st.markdown("""
+        <div class="stat-card">
+            <div class="stat-num">📂</div>
+            <div class="stat-label">CSV, Excel, JSON</div>
+            <br><p style="color:#6a6a9a; font-size:0.85rem">Upload any tabular data file — we handle the parsing automatically</p>
+        </div>""", unsafe_allow_html=True)
+    with col2:
+        st.markdown("""
+        <div class="stat-card">
+            <div class="stat-num">💬</div>
+            <div class="stat-label">Natural Language Q&A</div>
+            <br><p style="color:#6a6a9a; font-size:0.85rem">Ask anything about your data in plain English — no SQL needed</p>
+        </div>""", unsafe_allow_html=True)
+    with col3:
+        st.markdown("""
+        <div class="stat-card">
+            <div class="stat-num">📊</div>
+            <div class="stat-label">Smart Visualizations</div>
+            <br><p style="color:#6a6a9a; font-size:0.85rem">AI picks the right chart for your question automatically</p>
+        </div>""", unsafe_allow_html=True)
+    st.markdown("<br>", unsafe_allow_html=True)
+    st.info("👈 Enter your Gemini API key and upload a data file in the sidebar to get started!")
+else:
+    df      = st.session_state.df
+    profile = st.session_state.profile
+    llm     = st.session_state.llm
+    # ── Tabs ─────────────────────────────────────────────────────────────────
+    tab1, tab2, tab3, tab4 = st.tabs(["📊 Dashboard", "💬 Chat", "🎨 Charts", "🔍 Raw Data"])
+    # ════════════════════════════════════════════════════════════════
+    # TAB 1 — Dashboard
+    # ════════════════════════════════════════════════════════════════
+    with tab1:
+        rows, cols = profile["shape"]
+        nulls  = sum(profile["null_counts"].values())
+        num_c  = len(profile["numeric_columns"])
+        cat_c  = len(profile["categorical_columns"])
+        c1, c2, c3, c4 = st.columns(4)
+        c1.markdown(f'<div class="stat-card"><div class="stat-num">{rows:,}</div><div class="stat-label">Rows</div></div>', unsafe_allow_html=True)
+        c2.markdown(f'<div class="stat-card"><div class="stat-num">{cols}</div><div class="stat-label">Columns</div></div>', unsafe_allow_html=True)
+        c3.markdown(f'<div class="stat-card"><div class="stat-num">{num_c}</div><div class="stat-label">Numeric Cols</div></div>', unsafe_allow_html=True)
+        c4.markdown(f'<div class="stat-card"><div class="stat-num">{nulls}</div><div class="stat-label">Missing Values</div></div>', unsafe_allow_html=True)
+        st.markdown("<br>", unsafe_allow_html=True)
+        # Column overview
+        st.markdown("#### 📋 Column Overview")
+        col_info = pd.DataFrame({
+            "Column": df.columns,
+            "Type": df.dtypes.astype(str).values,
+            "Non-Null": df.notnull().sum().values,
+            "Null %": (df.isnull().mean() * 100).round(1).values,
+            "Unique": df.nunique().values,
+        })
+        st.dataframe(col_info, use_container_width=True, hide_index=True)
+        # Auto charts
+        st.markdown("#### 🤖 Auto-Generated Insights")
+        suggested = auto_suggest_charts(profile)[:3]
+        chart_cols = st.columns(min(len(suggested), 2))
+        for i, ctype in enumerate(suggested[:2]):
+            with chart_cols[i]:
+                try:
+                    fig = make_plotly_chart(ctype, df, profile)
+                    st.plotly_chart(fig, use_container_width=True)
+                except Exception as e:
+                    st.warning(f"Could not render {ctype}: {e}")
+        if len(suggested) > 2:
+            try:
+                fig = make_plotly_chart(suggested[2], df, profile)
+                st.plotly_chart(fig, use_container_width=True)
+            except Exception:
+                pass
+        # AI summary
+        st.markdown("#### 🧠 AI Dataset Summary")
+        if st.button("✨ Generate AI Summary"):
+            with st.spinner("Gemini is analyzing your dataset..."):
+                summary = ask_agent(
+                    "Give me a concise executive summary of this dataset. "
+                    "Highlight key patterns, anomalies, and 3 actionable insights.",
+                    df, profile, llm
+                )
+                st.markdown(f'<div class="agent-bubble">{summary}</div>', unsafe_allow_html=True)
+    # ════════════════════════════════════════════════════════════════
+    # TAB 2 — Chat
+    # ════════════════════════════════════════════════════════════════
+    with tab2:
+        st.markdown("#### 💬 Ask Anything About Your Data")
+        st.markdown("*The AI has full context of your dataset and can answer complex analytical questions.*")
+        # Suggested questions
+        st.markdown("**Quick questions to try:**")
+        suggestions = [
+            "What are the top 5 most important patterns in this data?",
+            "Are there any outliers or anomalies I should know about?",
+            "What correlations exist between the numeric columns?",
+            "Summarize the distribution of categorical columns.",
+            "What would you recommend analyzing further?",
+        ]
+        q_cols = st.columns(3)
+        for i, s in enumerate(suggestions[:3]):
+            with q_cols[i]:
+                if st.button(s, key=f"sug_{i}"):
+                    st.session_state["prefill_q"] = s
+        # Chat history
+        for turn in st.session_state.chat_history:
+            st.markdown(f'<div class="user-bubble">👤 {turn["user"]}</div>', unsafe_allow_html=True)
+            st.markdown(f'<div class="agent-bubble">🧠 {turn["agent"]}</div>', unsafe_allow_html=True)
+        # Input
+        prefill = st.session_state.pop("prefill_q", "")
+        question = st.text_input(
+            "Ask a question...",
+            value=prefill,
+            placeholder="e.g. What's the average sales by region?",
+            label_visibility="collapsed",
+        )
+        col_send, col_clear = st.columns([1, 5])
+        with col_send:
+            send = st.button("Send 🚀")
+        with col_clear:
+            if st.button("Clear Chat"):
+                st.session_state.chat_history = []
+                st.rerun()
+        if send and question.strip():
+            with st.spinner("🧠 Gemini is thinking..."):
+                answer = ask_agent(question, df, profile, llm)
+                # Auto-generate relevant chart
+                chart_rec = ai_recommend_chart(question, profile, llm)
+                st.session_state.chat_history.append({
+                    "user": question,
+                    "agent": answer,
+                    "chart_rec": chart_rec,
+                })
+            st.markdown(f'<div class="user-bubble">👤 {question}</div>', unsafe_allow_html=True)
+            st.markdown(f'<div class="agent-bubble">🧠 {answer}</div>', unsafe_allow_html=True)
+            # Show recommended chart
+            if chart_rec:
+                st.markdown(f"*📊 Suggested chart: **{chart_rec['chart_type']}** — {chart_rec.get('reason','')}*")
+                try:
+                    fig = make_plotly_chart(
+                        chart_rec["chart_type"], df, profile,
+                        x_col=chart_rec.get("x_col"),
+                        y_col=chart_rec.get("y_col"),
+                    )
+                    st.plotly_chart(fig, use_container_width=True)
+                except Exception:
+                    pass
+    # ══════════════════════════════════════════════════════════════���═
+    # TAB 3 — Charts
+    # ════════════════════════════════════════════════════════════════
+    with tab3:
+        st.markdown("#### 🎨 Custom Chart Builder")
+        chart_options = {
+            "Correlation Heatmap": "correlation_heatmap",
+            "Distribution Plot": "distribution_plots",
+            "Box Plots": "box_plots",
+            "Bar Chart": "bar_chart",
+            "Pie Chart": "pie_chart",
+            "Scatter Plot": "scatter",
+            "Line Chart": "line",
+            "Scatter Matrix": "scatter_matrix",
+        }
+        if profile["datetime_columns"]:
+            chart_options["Time Series"] = "time_series"
+        c1, c2, c3 = st.columns(3)
+        with c1:
+            chart_label = st.selectbox("Chart Type", list(chart_options.keys()))
+        with c2:
+            all_cols = ["(auto)"] + df.columns.tolist()
+            x_col = st.selectbox("X Column", all_cols)
+        with c3:
+            y_col = st.selectbox("Y Column", all_cols)
+        x_val = None if x_col == "(auto)" else x_col
+        y_val = None if y_col == "(auto)" else y_col
+        if st.button("🎨 Generate Chart"):
+            with st.spinner("Rendering..."):
+                try:
+                    fig = make_plotly_chart(
+                        chart_options[chart_label], df, profile,
+                        x_col=x_val, y_col=y_val
+                    )
+                    st.plotly_chart(fig, use_container_width=True)
+                except Exception as e:
+                    st.error(f"Chart error: {e}")
+        st.markdown("---")
+        st.markdown("#### 📊 All Auto-Suggested Charts")
+        suggested_all = auto_suggest_charts(profile)
+        for i in range(0, len(suggested_all), 2):
+            cols = st.columns(2)
+            for j, ctype in enumerate(suggested_all[i:i+2]):
+                with cols[j]:
+                    try:
+                        fig = make_plotly_chart(ctype, df, profile)
+                        st.plotly_chart(fig, use_container_width=True)
+                    except Exception as e:
+                        st.warning(f"Could not render {ctype}")
+    # ════════════════════════════════════════════════════════════════
+    # TAB 4 — Raw Data
+    # ════════════════════════════════════════════════════════════════
+    with tab4:
+        st.markdown("#### 🔍 Raw Data Explorer")
+        # Search/filter
+        search = st.text_input("🔎 Filter rows containing...", placeholder="Type to filter...")
+        if search:
+            mask = df.astype(str).apply(lambda row: row.str.contains(search, case=False, na=False)).any(axis=1)
+            display_df = df[mask]
+            st.info(f"Showing {len(display_df):,} of {len(df):,} rows matching '{search}'")
+        else:
+            display_df = df
+        st.dataframe(display_df, use_container_width=True, height=500)
+        # Download
+        csv_buf = io.StringIO()
+        df.to_csv(csv_buf, index=False)
+        st.download_button(
+            "⬇️ Download as CSV",
+            data=csv_buf.getvalue(),
+            file_name="analyzed_data.csv",
+            mime="text/csv"
+        )

core_agent.py ADDED Viewed

	@@ -0,0 +1,318 @@

+"""
+core_agent.py
+=============
+LangChain + Gemini Data Analyst Agent — Core Logic
+Supports CSV, Excel (.xlsx, .xls), and JSON files
+"""
+import os
+import io
+import json
+import warnings
+import pandas as pd
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.ticker as mticker
+import seaborn as sns
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+from dotenv import load_dotenv
+from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain.prompts import PromptTemplate
+from langchain.chains import LLMChain
+from langchain.schema import HumanMessage, SystemMessage
+warnings.filterwarnings("ignore")
+load_dotenv()
+# ─── Palette ─────────────────────────────────────────────────────────────────
+PALETTE = ["#6C63FF", "#FF6584", "#43E97B", "#F7971E", "#4FC3F7", "#CE93D8"]
+DARK_BG  = "#0F0F1A"
+CARD_BG  = "#1A1A2E"
+# ─── LLM Setup ───────────────────────────────────────────────────────────────
+def get_llm(api_key: str):
+    return ChatGoogleGenerativeAI(
+        model="gemini-1.5-flash",
+        google_api_key=api_key,
+        temperature=0.3,
+        convert_system_message_to_human=True,
+    )
+# ─── File Loading ─────────────────────────────────────────────────────────────
+def load_file(file) -> tuple[pd.DataFrame, str]:
+    """Load uploaded file into a DataFrame. Returns (df, file_type)."""
+    name = file.name.lower()
+    if name.endswith(".csv"):
+        df = pd.read_csv(file)
+        return df, "CSV"
+    elif name.endswith((".xlsx", ".xls")):
+        df = pd.read_excel(file)
+        return df, "Excel"
+    elif name.endswith(".json"):
+        content = json.load(file)
+        if isinstance(content, list):
+            df = pd.DataFrame(content)
+        elif isinstance(content, dict):
+            df = pd.DataFrame([content]) if not any(isinstance(v, list) for v in content.values()) \
+                 else pd.DataFrame(content)
+        return df, "JSON"
+    else:
+        raise ValueError(f"Unsupported file type: {name}")
+# ─── Data Profile ─────────────────────────────────────────────────────────────
+def profile_dataframe(df: pd.DataFrame) -> dict:
+    """Generate a rich statistical profile of the dataframe."""
+    numeric_cols  = df.select_dtypes(include="number").columns.tolist()
+    category_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()
+    datetime_cols = df.select_dtypes(include=["datetime"]).columns.tolist()
+    profile = {
+        "shape": df.shape,
+        "columns": df.columns.tolist(),
+        "dtypes": df.dtypes.astype(str).to_dict(),
+        "numeric_columns": numeric_cols,
+        "categorical_columns": category_cols,
+        "datetime_columns": datetime_cols,
+        "null_counts": df.isnull().sum().to_dict(),
+        "null_pct": (df.isnull().mean() * 100).round(2).to_dict(),
+        "duplicates": int(df.duplicated().sum()),
+    }
+    if numeric_cols:
+        desc = df[numeric_cols].describe().round(3)
+        profile["numeric_stats"] = desc.to_dict()
+    if category_cols:
+        profile["top_categories"] = {
+            col: df[col].value_counts().head(5).to_dict()
+            for col in category_cols
+        }
+    return profile
+def profile_to_text(profile: dict, df: pd.DataFrame) -> str:
+    """Convert profile dict to LLM-readable text summary."""
+    rows, cols = profile["shape"]
+    lines = [
+        f"Dataset: {rows} rows × {cols} columns",
+        f"Numeric columns : {', '.join(profile['numeric_columns']) or 'None'}",
+        f"Categorical cols : {', '.join(profile['categorical_columns']) or 'None'}",
+        f"Datetime cols    : {', '.join(profile['datetime_columns']) or 'None'}",
+        f"Missing values   : {sum(profile['null_counts'].values())} total",
+        f"Duplicate rows   : {profile['duplicates']}",
+        "",
+        "--- Sample Data (first 5 rows) ---",
+        df.head(5).to_string(index=False),
+    ]
+    if profile.get("numeric_stats"):
+        lines += ["", "--- Numeric Stats ---"]
+        for col, stats in profile["numeric_stats"].items():
+            lines.append(f"  {col}: mean={stats.get('mean','?')}, std={stats.get('std','?')}, "
+                         f"min={stats.get('min','?')}, max={stats.get('max','?')}")
+    return "\n".join(lines)
+# ─── AI Question Answering ─────────────────────────────────────────��──────────
+def ask_agent(question: str, df: pd.DataFrame, profile: dict, llm) -> str:
+    """Send a question + data context to Gemini and return the answer."""
+    data_context = profile_to_text(profile, df)
+    system = """You are an expert data analyst AI. You receive a dataset summary and answer questions about it.
+Be precise, insightful, and helpful. When relevant, suggest what visualizations would best illustrate the answer.
+Format your response clearly. Use bullet points for lists. Use numbers and percentages when quoting statistics."""
+    user_msg = f"""Here is the dataset context:
+{data_context}
+User question: {question}
+Provide a thorough, accurate analysis. If you perform calculations, show the logic briefly."""
+    messages = [
+        SystemMessage(content=system),
+        HumanMessage(content=user_msg),
+    ]
+    response = llm.invoke(messages)
+    return response.content
+# ─── Visualization Engine ─────────────────────────────────────────────────────
+def auto_suggest_charts(profile: dict) -> list[str]:
+    """Suggest relevant chart types based on data profile."""
+    suggestions = []
+    if len(profile["numeric_columns"]) >= 2:
+        suggestions.append("correlation_heatmap")
+        suggestions.append("scatter_matrix")
+    if profile["numeric_columns"]:
+        suggestions.append("distribution_plots")
+        suggestions.append("box_plots")
+    if profile["categorical_columns"] and profile["numeric_columns"]:
+        suggestions.append("bar_chart")
+        suggestions.append("pie_chart")
+    if profile["datetime_columns"] and profile["numeric_columns"]:
+        suggestions.append("time_series")
+    return suggestions
+def make_plotly_chart(chart_type: str, df: pd.DataFrame, profile: dict,
+                      x_col: str = None, y_col: str = None, color_col: str = None):
+    """Generate a Plotly figure for the given chart type."""
+    num_cols = profile["numeric_columns"]
+    cat_cols = profile["categorical_columns"]
+    template = "plotly_dark"
+    if chart_type == "correlation_heatmap" and len(num_cols) >= 2:
+        corr = df[num_cols].corr().round(2)
+        fig = px.imshow(
+            corr, text_auto=True, color_continuous_scale="RdBu_r",
+            title="Correlation Heatmap", template=template,
+            color_continuous_midpoint=0,
+        )
+    elif chart_type == "distribution_plots" and num_cols:
+        col = y_col or num_cols[0]
+        fig = px.histogram(
+            df, x=col, nbins=30, marginal="box",
+            title=f"Distribution of {col}",
+            color_discrete_sequence=PALETTE,
+            template=template,
+        )
+    elif chart_type == "box_plots" and num_cols:
+        cols = num_cols[:6]
+        fig = go.Figure()
+        for i, col in enumerate(cols):
+            fig.add_trace(go.Box(y=df[col], name=col, marker_color=PALETTE[i % len(PALETTE)]))
+        fig.update_layout(title="Box Plots — Numeric Columns", template=template)
+    elif chart_type == "bar_chart" and cat_cols and num_cols:
+        xc = x_col or cat_cols[0]
+        yc = y_col or num_cols[0]
+        agg = df.groupby(xc)[yc].mean().reset_index().sort_values(yc, ascending=False).head(15)
+        fig = px.bar(
+            agg, x=xc, y=yc, color=yc,
+            color_continuous_scale="Viridis",
+            title=f"Average {yc} by {xc}", template=template,
+        )
+    elif chart_type == "pie_chart" and cat_cols:
+        col = x_col or cat_cols[0]
+        counts = df[col].value_counts().head(8)
+        fig = px.pie(
+            values=counts.values, names=counts.index,
+            title=f"Distribution of {col}",
+            color_discrete_sequence=PALETTE,
+            template=template,
+        )
+    elif chart_type == "scatter_matrix" and len(num_cols) >= 2:
+        cols = num_cols[:4]
+        fig = px.scatter_matrix(
+            df, dimensions=cols,
+            color=cat_cols[0] if cat_cols else None,
+            color_discrete_sequence=PALETTE,
+            title="Scatter Matrix", template=template,
+        )
+        fig.update_traces(diagonal_visible=False, showupperhalf=False)
+    elif chart_type == "time_series" and profile["datetime_columns"] and num_cols:
+        dt_col = profile["datetime_columns"][0]
+        yc = y_col or num_cols[0]
+        fig = px.line(
+            df.sort_values(dt_col), x=dt_col, y=yc,
+            title=f"{yc} over Time",
+            color_discrete_sequence=PALETTE,
+            template=template,
+        )
+    elif chart_type == "scatter" and len(num_cols) >= 2:
+        xc = x_col or num_cols[0]
+        yc = y_col or num_cols[1]
+        fig = px.scatter(
+            df, x=xc, y=yc,
+            color=color_col or (cat_cols[0] if cat_cols else None),
+            color_discrete_sequence=PALETTE,
+            title=f"{xc} vs {yc}",
+            trendline="ols",
+            template=template,
+        )
+    elif chart_type == "line" and num_cols:
+        xc = x_col or (profile["datetime_columns"][0] if profile["datetime_columns"] else num_cols[0])
+        yc = y_col or num_cols[0]
+        fig = px.line(
+            df, x=xc, y=yc,
+            color_discrete_sequence=PALETTE,
+            title=f"{yc} trend",
+            template=template,
+        )
+    else:
+        # Fallback: summary bar
+        if num_cols:
+            means = df[num_cols[:8]].mean()
+            fig = px.bar(
+                x=means.index, y=means.values,
+                labels={"x": "Column", "y": "Mean Value"},
+                color=means.values, color_continuous_scale="Viridis",
+                title="Column Means Overview", template=template,
+            )
+        else:
+            fig = go.Figure()
+            fig.add_annotation(text="No numeric data available for this chart type.",
+                               showarrow=False, font=dict(size=14))
+            fig.update_layout(template=template, title="Chart Unavailable")
+    fig.update_layout(
+        paper_bgcolor=DARK_BG,
+        plot_bgcolor=CARD_BG,
+        font=dict(family="DM Sans, sans-serif", color="#E0E0FF"),
+        margin=dict(l=40, r=40, t=60, b=40),
+    )
+    return fig
+# ─── AI-Driven Chart Recommendation ──────────────────────────────────────────
+def ai_recommend_chart(question: str, profile: dict, llm) -> dict:
+    """Ask Gemini which chart best answers the user's question."""
+    num_cols  = profile["numeric_columns"]
+    cat_cols  = profile["categorical_columns"]
+    dt_cols   = profile["datetime_columns"]
+    prompt = f"""Given this dataset profile:
+- Numeric columns: {num_cols}
+- Categorical columns: {cat_cols}
+- Datetime columns: {dt_cols}
+The user asked: "{question}"
+Recommend ONE chart type from this list that best answers their question:
+[correlation_heatmap, distribution_plots, box_plots, bar_chart, pie_chart, scatter, line, time_series, scatter_matrix]
+Also suggest the best x_col and y_col from the available columns.
+Respond ONLY in valid JSON like:
+{{"chart_type": "bar_chart", "x_col": "category_col", "y_col": "numeric_col", "reason": "short explanation"}}"""
+    try:
+        response = llm.invoke([HumanMessage(content=prompt)])
+        text = response.content.strip()
+        # strip markdown fences if present
+        if "```" in text:
+            text = text.split("```")[1]
+            if text.startswith("json"):
+                text = text[4:]
+        return json.loads(text.strip())
+    except Exception:
+        return {"chart_type": "distribution_plots", "x_col": None, "y_col": None, "reason": "Default chart"}

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+langchain==0.3.7
+langchain-google-genai==2.0.5
+langchain-experimental==0.3.3
+langchain-community==0.3.7
+google-generativeai==0.8.3
+pandas==2.2.3
+openpyxl==3.1.5
+xlrd==2.0.1
+matplotlib==3.9.2
+seaborn==0.13.2
+plotly==5.24.1
+streamlit==1.40.1
+python-dotenv==1.0.1
+tabulate==0.9.0

sample_data.csv ADDED Viewed

	@@ -0,0 +1,31 @@

+order_id,date,product,category,region,sales,quantity,profit,customer_age,customer_gender
+1001,2024-01-05,Laptop Pro,Electronics,North,1200.00,1,240.00,34,Male
+1002,2024-01-07,Office Chair,Furniture,South,350.00,2,70.00,45,Female
+1003,2024-01-08,Wireless Mouse,Electronics,East,45.00,5,9.00,28,Male
+1004,2024-01-10,Standing Desk,Furniture,West,650.00,1,130.00,52,Female
+1005,2024-01-12,Mechanical Keyboard,Electronics,North,120.00,3,36.00,30,Male
+1006,2024-01-15,Monitor 4K,Electronics,South,400.00,2,80.00,41,Female
+1007,2024-01-18,Notebook Set,Stationery,East,25.00,10,7.50,23,Male
+1008,2024-01-20,Ergonomic Chair,Furniture,West,520.00,1,104.00,38,Female
+1009,2024-01-22,USB Hub,Electronics,North,35.00,8,10.50,26,Male
+1010,2024-01-25,Desk Lamp,Furniture,South,60.00,4,18.00,49,Female
+1011,2024-02-01,Laptop Pro,Electronics,East,1200.00,2,480.00,36,Male
+1012,2024-02-03,Wireless Headphones,Electronics,West,200.00,3,60.00,31,Female
+1013,2024-02-05,Pen Set,Stationery,North,15.00,20,6.00,22,Male
+1014,2024-02-08,Gaming Chair,Furniture,South,450.00,1,90.00,27,Female
+1015,2024-02-10,Tablet,Electronics,East,600.00,2,120.00,43,Male
+1016,2024-02-14,Bookshelf,Furniture,West,180.00,1,36.00,55,Female
+1017,2024-02-16,Webcam HD,Electronics,North,80.00,6,24.00,29,Male
+1018,2024-02-18,Sticky Notes,Stationery,South,8.00,50,4.00,24,Female
+1019,2024-02-20,Monitor Stand,Furniture,East,95.00,3,28.50,37,Male
+1020,2024-02-22,Smartphone,Electronics,West,900.00,2,180.00,33,Female
+1021,2024-03-01,Laptop Pro,Electronics,North,1200.00,3,720.00,40,Male
+1022,2024-03-04,Office Chair,Furniture,South,350.00,4,140.00,48,Female
+1023,2024-03-06,Drawing Tablet,Electronics,East,300.00,1,60.00,25,Male
+1024,2024-03-09,Filing Cabinet,Furniture,West,220.00,2,44.00,53,Female
+1025,2024-03-12,Wireless Mouse,Electronics,North,45.00,10,22.50,32,Male
+1026,2024-03-15,External SSD,Electronics,South,150.00,4,45.00,44,Female
+1027,2024-03-18,Highlighters,Stationery,East,12.00,30,5.40,21,Male
+1028,2024-03-20,Desk Organizer,Furniture,West,40.00,7,14.00,35,Female
+1029,2024-03-22,Smart Speaker,Electronics,North,120.00,5,36.00,39,Male
+1030,2024-03-25,Printer,Electronics,South,280.00,2,56.00,46,Female