Esvanth commited on
Commit
0ed43fe
·
0 Parent(s):

Initial commit

Browse files
Files changed (7) hide show
  1. .gitignore +9 -0
  2. README.md +93 -0
  3. app.py +860 -0
  4. requirements.txt +5 -0
  5. task2_segmentation.py +240 -0
  6. task3_4_routing.py +333 -0
  7. task5_forecasting.py +137 -0
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ output/
2
+ __pycache__/
3
+ *.pyc
4
+ .env
5
+ *.png
6
+ *.docx
7
+ *.pdf
8
+ .claude/
9
+ EcoCart_Report.docx
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EcoCart AI System
2
+
3
+ An interactive AI-powered logistics simulation
4
+
5
+ 🚀 **Live Demo:** [Launch on Streamlit](https://esvanth-ecocart-ai.streamlit.app)
6
+
7
+ ---
8
+
9
+ ## What is EcoCart?
10
+
11
+ EcoCart is a mid-sized e-commerce company facing challenges in optimising its logistics network. This project proposes an AI-based solution across five tasks — from intelligent delivery agents to demand forecasting.
12
+
13
+ ---
14
+
15
+ ## Tasks Covered
16
+
17
+ ### Task 1 — AI Agents
18
+ Demonstrates three types of AI agents navigating a delivery map in real time:
19
+ - **Reactive Agent** — goes to the nearest stop, no planning
20
+ - **Goal-Based Agent** — plans the full route before departing (2-opt optimised)
21
+ - **Utility-Based Agent** — balances urgency vs distance to prioritise high-value stops
22
+
23
+ ### Task 2 — Bias Detection & Mitigation
24
+ Uses K-Means clustering to segment customers into value tiers. Detects urban/rural bias using **Disparate Impact (DI)** analysis and applies a three-step mitigation strategy:
25
+ - Oversample rural customers to balance the dataset
26
+ - Adjust spend for delivery cost premium (+€12)
27
+ - Adjust frequency for rural order batching (×1.5)
28
+
29
+ ### Task 3 — Search Algorithms for Route Optimisation
30
+ Implements all four search algorithms on a 20-node urban/rural delivery network:
31
+ - **BFS** — Breadth-First Search
32
+ - **DFS** — Depth-First Search
33
+ - **A\*** — Best-first with Euclidean heuristic
34
+ - **IDA\*** — Iterative Deepening A*
35
+
36
+ Includes a live **exploration replay slider** — drag to watch the algorithm search node by node.
37
+
38
+ ### Task 4 — A* vs IDA* Comparative Analysis
39
+ Benchmarks both algorithms on 10 origin-destination pairs (5 urban, 5 rural) over multiple timing runs. Compares nodes expanded, average time, and memory behaviour.
40
+
41
+ ### Task 5 — Demand Forecasting
42
+ Trains two ML models on 730 days of synthetic sales data:
43
+ - **Linear Regression** — fast and interpretable
44
+ - **Random Forest** — captures non-linear seasonal patterns
45
+
46
+ Features a **what-if predictor** — enter any day, month, and promotion flag to get an instant sales prediction.
47
+
48
+ ---
49
+
50
+ ## Tech Stack
51
+
52
+ | Tool | Purpose |
53
+ |------|---------|
54
+ | Python 3.11 | Core language |
55
+ | Streamlit | Interactive web app |
56
+ | Plotly | Interactive charts |
57
+ | scikit-learn | K-Means, LR, Random Forest |
58
+ | NumPy / Pandas | Data processing |
59
+
60
+ ---
61
+
62
+ ## Run Locally
63
+
64
+ ```bash
65
+ git clone https://github.com/Esvanth/Ecocart-AI.git
66
+ cd Ecocart-AI
67
+ pip install -r requirements.txt
68
+ streamlit run app.py
69
+ ```
70
+
71
+ ---
72
+
73
+ ## Project Structure
74
+
75
+ ```
76
+ Ecocart-AI/
77
+ ├── app.py # Main Streamlit app (all 5 tasks)
78
+ ├── task2_segmentation.py # Standalone Task 2 script
79
+ ├── task3_4_routing.py # Standalone Tasks 3 & 4 script
80
+ ├── task5_forecasting.py # Standalone Task 5 script
81
+ ├── requirements.txt # Python dependencies
82
+ └── README.md
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Author
88
+
89
+ **Esvanth Mohankumar**
90
+ Student ID: 24311073
91
+ Programme: MSc Artificial Intelligence
92
+ Institution: National College of Ireland
93
+ Module: Foundations of AI
app.py ADDED
@@ -0,0 +1,860 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ EcoCart AI System — TABA Section II
3
+ NCI MSCAI 2026
4
+ """
5
+
6
+ import math, heapq, time
7
+ from collections import deque
8
+
9
+ import numpy as np
10
+ import pandas as pd
11
+ import plotly.graph_objects as go
12
+ from plotly.subplots import make_subplots
13
+ import streamlit as st
14
+ from sklearn.cluster import KMeans
15
+ from sklearn.preprocessing import StandardScaler
16
+ from sklearn.linear_model import LinearRegression
17
+ from sklearn.ensemble import RandomForestRegressor
18
+ from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
19
+
20
+ # ── page ──────────────────────────────────────────────────────────────────────
21
+ st.set_page_config(page_title="EcoCart AI", layout="wide",
22
+ initial_sidebar_state="collapsed")
23
+
24
+ st.markdown("""
25
+ <style>
26
+ [data-testid="stAppViewContainer"] { background:#f0f4f8; }
27
+ [data-testid="stHeader"] { background:transparent; }
28
+ .block-container { padding:1rem 2rem 3rem; }
29
+ .stTabs [data-baseweb="tab-list"] { background:#fff; border-radius:12px;
30
+ padding:4px; box-shadow:0 1px 4px rgba(0,0,0,.08); }
31
+ .stTabs [data-baseweb="tab"] { font-size:.88rem; font-weight:600;
32
+ border-radius:8px; padding:8px 20px; }
33
+ div[data-testid="metric-container"]{ background:#fff; border-radius:10px;
34
+ padding:14px 18px;
35
+ box-shadow:0 1px 4px rgba(0,0,0,.07); }
36
+ .card { background:#fff; border-radius:14px; padding:20px 24px;
37
+ box-shadow:0 1px 5px rgba(0,0,0,.08); margin-bottom:14px; }
38
+ .badge-green { display:inline-block; background:#d1fae5; color:#065f46;
39
+ border-radius:99px; padding:3px 12px; font-size:.78rem;
40
+ font-weight:700; }
41
+ .badge-red { display:inline-block; background:#fee2e2; color:#991b1b;
42
+ border-radius:99px; padding:3px 12px; font-size:.78rem;
43
+ font-weight:700; }
44
+ .badge-blue { display:inline-block; background:#dbeafe; color:#1e40af;
45
+ border-radius:99px; padding:3px 12px; font-size:.78rem;
46
+ font-weight:700; }
47
+ .tip { background:#f8fafc; border:1px solid #e2e8f0; border-radius:8px;
48
+ padding:10px 14px; font-size:.82rem; color:#475569; margin:8px 0; }
49
+ .section-label { font-size:.72rem; font-weight:700; letter-spacing:.08em;
50
+ color:#94a3b8; text-transform:uppercase; margin-bottom:4px; }
51
+ </style>
52
+ """, unsafe_allow_html=True)
53
+
54
+ # ── colours ───────────────────────────────────────────────────────────────────
55
+ BG,SURF,LINE = "#f0f4f8","#ffffff","#e2e8f0"
56
+ FG,MUTE = "#1e293b","#64748b"
57
+ GREEN,BLUE,RED,AMBER,PURPLE = "#10b981","#3b82f6","#ef4444","#f59e0b","#8b5cf6"
58
+
59
+ SEG_COL={"High Value":GREEN,"Medium":AMBER,"Low Value":RED,"Group 4":PURPLE}
60
+
61
+ def _ch(h=380,title=""):
62
+ return dict(height=h,paper_bgcolor=SURF,plot_bgcolor=BG,
63
+ font=dict(color=FG,size=11),
64
+ title=dict(text=title,font=dict(size=13,color=FG),x=0),
65
+ margin=dict(l=50,r=20,t=48,b=40),
66
+ legend=dict(bgcolor=SURF,bordercolor=LINE,borderwidth=1))
67
+
68
+ def _xax(**k): return dict(gridcolor=LINE,zeroline=False,linecolor=LINE,**k)
69
+ def _yax(**k): return dict(gridcolor=LINE,zeroline=False,linecolor=LINE,**k)
70
+
71
+ # ══════════════════════════════════════════════════════════════════════════════
72
+ # NETWORK DATA
73
+ # ══════════════════════════════════════════════════════════════════════════════
74
+ NODES={
75
+ "U1":(1.0,1.0,"urban"), "U2":(2.0,1.5,"urban"), "U3":(3.0,1.0,"urban"),
76
+ "U4":(1.5,2.5,"urban"), "U5":(2.5,3.0,"urban"), "U6":(3.5,2.0,"urban"),
77
+ "U7":(1.0,3.5,"urban"), "U8":(2.0,4.0,"urban"), "U9":(3.0,4.0,"urban"),
78
+ "U10":(4.0,3.5,"urban"),
79
+ "R1":(6.0,1.0,"rural"), "R2":(8.0,2.0,"rural"), "R3":(10.0,1.5,"rural"),
80
+ "R4":(7.0,4.0,"rural"), "R5":(9.0,4.5,"rural"), "R6":(11.0,3.5,"rural"),
81
+ "R7":(6.5,6.0,"rural"), "R8":(9.0,7.0,"rural"), "R9":(11.0,6.0,"rural"),
82
+ "R10":(8.0,5.5,"rural"),
83
+ }
84
+ _EP=[("U1","U2"),("U2","U3"),("U1","U4"),("U2","U4"),("U2","U5"),
85
+ ("U3","U6"),("U4","U5"),("U5","U6"),("U4","U7"),("U5","U8"),
86
+ ("U6","U10"),("U7","U8"),("U8","U9"),("U9","U10"),("U5","U9"),
87
+ ("R1","R2"),("R2","R3"),("R1","R4"),("R2","R4"),("R3","R6"),
88
+ ("R4","R5"),("R5","R6"),("R4","R7"),("R5","R10"),("R7","R10"),
89
+ ("R7","R8"),("R8","R9"),("R6","R9"),("R8","R10"),("R5","R8"),
90
+ ("U3","R1"),("U10","R4"),("U6","R1"),("U9","R7")]
91
+
92
+ def _nd(a,b): return math.hypot(NODES[a][0]-NODES[b][0],NODES[a][1]-NODES[b][1])
93
+ def _cr(a,b):
94
+ za,zb=NODES[a][2],NODES[b][2]
95
+ return 0.28 if za==zb=="urban" else 0.18 if za!=zb else 0.10
96
+
97
+ EDGES =[(a,b,round(_nd(a,b)*1.15,2)) for a,b in _EP]
98
+ CO2_EDGES=[(a,b,round(_nd(a,b)*1.15*_cr(a,b),3)) for a,b in _EP]
99
+ ADJ_KM={n:[] for n in NODES}; ADJ_CO2={n:[] for n in NODES}
100
+ for i,(a,b,w) in enumerate(EDGES):
101
+ ADJ_KM[a].append((b,w)); ADJ_KM[b].append((a,w))
102
+ c=CO2_EDGES[i][2]; ADJ_CO2[a].append((b,c)); ADJ_CO2[b].append((a,c))
103
+
104
+ def _ew(a,b,adj):
105
+ for nb,w in adj[a]:
106
+ if nb==b: return w
107
+ return math.inf
108
+
109
+ # ── algorithms (return path, cost, exploration_order) ─────────────────────────
110
+ def bfs(s,g,adj):
111
+ q=deque([(s,[s])]); seen={s}; expl=[]
112
+ while q:
113
+ n,p=q.popleft(); expl.append(n)
114
+ if n==g:
115
+ return p,round(sum(_ew(p[i],p[i+1],adj) for i in range(len(p)-1)),2),expl
116
+ for nb,_ in adj[n]:
117
+ if nb not in seen: seen.add(nb); q.append((nb,p+[nb]))
118
+ return None,0.0,expl
119
+
120
+ def dfs(s,g,adj):
121
+ stack=[(s,[s])]; seen={s}; expl=[]
122
+ while stack:
123
+ n,p=stack.pop(); expl.append(n)
124
+ if n==g:
125
+ return p,round(sum(_ew(p[i],p[i+1],adj) for i in range(len(p)-1)),2),expl
126
+ if len(p)>=50: continue
127
+ for nb,_ in adj[n]:
128
+ if nb not in seen: seen.add(nb); stack.append((nb,p+[nb]))
129
+ return None,0.0,expl
130
+
131
+ def astar(s,g,adj):
132
+ ctr=0; h=lambda n:_nd(n,g); expl=[]
133
+ heap=[(h(s),0.0,ctr,s,[s])]; best={s:0.0}
134
+ while heap:
135
+ _,gc,_,n,p=heapq.heappop(heap)
136
+ if n==g: return p,round(gc,2),expl
137
+ if gc>best.get(n,math.inf): continue
138
+ expl.append(n)
139
+ for nb,w in adj[n]:
140
+ ng=gc+w
141
+ if ng<best.get(nb,math.inf):
142
+ best[nb]=ng; ctr+=1
143
+ heapq.heappush(heap,(ng+h(nb),ng,ctr,nb,p+[nb]))
144
+ return None,0.0,expl
145
+
146
+ def ida_star(s,g,adj):
147
+ expl=[]; h=lambda n:_nd(n,g)
148
+ def _dfs(n,gc,bound,path,vis):
149
+ f=gc+h(n)
150
+ if f>bound: return None,f
151
+ expl.append(n)
152
+ if n==g: return list(path),gc
153
+ nxt=math.inf
154
+ for nb,w in adj[n]:
155
+ if nb in vis: continue
156
+ vis.add(nb); path.append(nb)
157
+ r,t=_dfs(nb,gc+w,bound,path,vis)
158
+ if r is not None: return r,t
159
+ if t<nxt: nxt=t
160
+ path.pop(); vis.remove(nb)
161
+ return None,nxt
162
+ bound=h(s)
163
+ while True:
164
+ r,t=_dfs(s,0.0,bound,[s],{s})
165
+ if r is not None: return r,round(t,2),expl
166
+ if t==math.inf: return None,0.0,expl
167
+ bound=t
168
+
169
+ ALGOS={"BFS":bfs,"DFS":dfs,"A*":astar,"IDA*":ida_star}
170
+
171
+ # ── network figure builder ────────────────────────────────────────────────────
172
+ def build_network(sn,en,path,explored_so_far,adj,unit,algo_name):
173
+ pc=GREEN if unit=="CO2" else AMBER
174
+ path_set=set(path) if path else set()
175
+ fig=go.Figure()
176
+
177
+ # edges
178
+ for a,b,w in EDGES:
179
+ on_path=(a in path_set and b in path_set and
180
+ any((path[i]==a and path[i+1]==b) or
181
+ (path[i]==b and path[i+1]==a)
182
+ for i in range(len(path)-1)) if path else False)
183
+ lc=pc if on_path else "#dde3ed"
184
+ lw=5 if on_path else 1.5
185
+ co2w=_ew(a,b,ADJ_CO2)
186
+ fig.add_trace(go.Scatter(
187
+ x=[NODES[a][0],NODES[b][0],None],y=[NODES[a][1],NODES[b][1],None],
188
+ mode="lines",line=dict(color=lc,width=lw),
189
+ showlegend=False,hoverinfo="skip"))
190
+
191
+ # nodes
192
+ for zone,bc in [("urban","#ef4444"),("rural",GREEN)]:
193
+ ns=[(n,d) for n,d in NODES.items() if d[2]==zone]
194
+ cols,sizes=[],[]
195
+ for n,_ in ns:
196
+ if n==sn: cols.append("#fff"); sizes.append(28)
197
+ elif n==en: cols.append("#facc15"); sizes.append(28)
198
+ elif n in path_set:cols.append(pc); sizes.append(22)
199
+ elif n in explored_so_far: cols.append("#bfdbfe"); sizes.append(18)
200
+ else: cols.append(bc); sizes.append(18)
201
+ fig.add_trace(go.Scatter(
202
+ x=[d[0] for _,d in ns],y=[d[1] for _,d in ns],
203
+ mode="markers+text",name=zone.title(),
204
+ marker=dict(size=sizes,color=cols,line=dict(color=FG,width=1.5)),
205
+ text=[n for n,_ in ns],textposition="middle center",
206
+ textfont=dict(size=8,color=FG,family="monospace"),
207
+ hovertemplate="<b>%{text}</b><br>"+zone+"<extra></extra>"))
208
+
209
+ title=(f"{algo_name}: {sn} → {en} | "
210
+ f"{'Explored '+str(len(explored_so_far))+' nodes' if explored_so_far else 'Ready'}")
211
+ fig.update_layout(**_ch(480,title))
212
+ fig.update_layout(legend=dict(bgcolor=SURF,bordercolor=LINE,x=0.01,y=0.99))
213
+ fig.update_xaxes(showgrid=False,showticklabels=False,zeroline=False)
214
+ fig.update_yaxes(showgrid=False,showticklabels=False,zeroline=False)
215
+ return fig
216
+
217
+ # ══════════════════════════════════════════════════════════════════════════════
218
+ # AGENT SIMULATION
219
+ # ══════════════════════════════════════════════════════════════════════════════
220
+ STOPS={
221
+ "Depot": (0.0,0.0,0), "Shop A":(2.0,3.0,3), "Shop B":(5.0,1.0,4),
222
+ "Shop C":(7.0,4.0,2), "Shop D":(3.0,6.0,5), "Shop E":(8.0,7.0,1),
223
+ "Shop F":(1.0,8.0,3), "Shop G":(6.0,9.0,4), "Shop H":(9.0,2.0,2),
224
+ }
225
+ def _sd(a,b): ax,ay,_=STOPS[a]; bx,by,_=STOPS[b]; return math.hypot(ax-bx,ay-by)
226
+
227
+ def _reactive():
228
+ r=["Depot"]; u=[k for k in STOPS if k!="Depot"]; cur="Depot"
229
+ while u: nb=min(u,key=lambda n:_sd(cur,n)); r.append(nb); u.remove(nb); cur=nb
230
+ return r+["Depot"]
231
+
232
+ def _goal():
233
+ r=_reactive()[:-1]
234
+ td=lambda x:sum(_sd(x[i],x[i+1]) for i in range(len(x)-1))+_sd(x[-1],x[0])
235
+ ok=True
236
+ while ok:
237
+ ok=False
238
+ for i in range(1,len(r)-1):
239
+ for j in range(i+1,len(r)):
240
+ nr=r[:i]+r[i:j+1][::-1]+r[j+1:]
241
+ if td(nr)<td(r)-1e-9: r=nr; ok=True
242
+ return r+["Depot"]
243
+
244
+ def _utility():
245
+ r=["Depot"]; u=[k for k in STOPS if k!="Depot"]; cur="Depot"
246
+ while u:
247
+ nb=max(u,key=lambda n:STOPS[n][2]/(_sd(cur,n)+.1))
248
+ r.append(nb); u.remove(nb); cur=nb
249
+ return r+["Depot"]
250
+
251
+ ROUTES={"Nearest stop":_reactive(),"Planned route":_goal(),"Priority first":_utility()}
252
+ AGENT_COL={"Nearest stop":BLUE,"Planned route":GREEN,"Priority first":AMBER}
253
+ AGENT_DESC={
254
+ "Nearest stop": "Reactive agent — goes to the closest unvisited stop. Simple and fast, no planning.",
255
+ "Planned route": "Goal-based agent — computes the shortest full route before departing.",
256
+ "Priority first":"Utility-based agent — balances urgency vs distance. Starred stops are served first.",
257
+ }
258
+
259
+ def _route_km(r): return round(sum(_sd(r[i],r[i+1]) for i in range(len(r)-1)),2)
260
+
261
+ def draw_agent(route,step,ac):
262
+ visited=set(route[:step+1]); pso=route[:step+1]
263
+ km=sum(_sd(pso[i],pso[i+1]) for i in range(len(pso)-1))
264
+ cur=route[step]
265
+ fig=go.Figure()
266
+ for na in STOPS:
267
+ for nb in STOPS:
268
+ if na>=nb: continue
269
+ x1,y1,_=STOPS[na]; x2,y2,_=STOPS[nb]
270
+ if math.hypot(x1-x2,y1-y2)<5.5:
271
+ fig.add_trace(go.Scatter(x=[x1,x2,None],y=[y1,y2,None],mode="lines",
272
+ line=dict(color="#e2e8f0",width=1),showlegend=False,hoverinfo="skip"))
273
+ if len(pso)>1:
274
+ fig.add_trace(go.Scatter(
275
+ x=[STOPS[n][0] for n in pso],y=[STOPS[n][1] for n in pso],
276
+ mode="lines+markers",line=dict(color=ac,width=3),
277
+ marker=dict(size=6,color=ac),showlegend=False,hoverinfo="skip"))
278
+ for name,(nx,ny,pri) in STOPS.items():
279
+ if name=="Depot": nc,sz,sym="#3b82f6",26,"square"
280
+ elif name==cur: nc,sz,sym=ac,28,"circle"
281
+ elif name in visited: nc,sz,sym=GREEN,18,"circle"
282
+ else: nc,sz,sym="#cbd5e1",18,"circle"
283
+ label=("⭐" if pri>=4 else "")+" "+name.replace("Shop ","")
284
+ fig.add_trace(go.Scatter(x=[nx],y=[ny],mode="markers+text",showlegend=False,
285
+ marker=dict(size=sz,color=nc,line=dict(color="#fff",width=2)),
286
+ text=[label.strip()],textposition="top center",textfont=dict(size=9,color=FG),
287
+ hovertemplate=f"<b>{name}</b><br>Priority {pri}/5<br>{'✓ Visited' if name in visited else 'Pending'}<extra></extra>"))
288
+ fig.update_layout(**_ch(400,f"Step {step}/{len(route)-1} — {km:.1f} km so far"))
289
+ fig.update_xaxes(showgrid=False,showticklabels=False,zeroline=False,range=[-0.5,10.5])
290
+ fig.update_yaxes(showgrid=False,showticklabels=False,zeroline=False,range=[-0.5,10.5])
291
+ return fig, round(km,2)
292
+
293
+ # ══════════════════════════════════════════════════════════════════════════════
294
+ # SEGMENTATION
295
+ # ══════════════════════════════════════════════════════════════════════════════
296
+ @st.cache_data
297
+ def _customers(nu,nr):
298
+ rng=np.random.default_rng(42)
299
+ u=pd.DataFrame({"freq":rng.normal(6,2,nu).clip(.5),"spend":rng.normal(120,40,nu).clip(10),
300
+ "recency":rng.exponential(10,nu).clip(1,90),"region":"urban"})
301
+ r=pd.DataFrame({"freq":rng.normal(3,1.5,nr).clip(.5),"spend":rng.normal(65,30,nr).clip(10),
302
+ "recency":rng.exponential(15,nr).clip(1,90),"region":"rural"})
303
+ return pd.concat([u,r],ignore_index=True).round(1)
304
+
305
+ def _kmeans(df,k):
306
+ X=StandardScaler().fit_transform(df[["freq","spend","recency"]])
307
+ df=df.copy(); df["cluster"]=KMeans(n_clusters=k,random_state=42,n_init=10).fit_predict(X)
308
+ order=df.groupby("cluster")["spend"].mean().sort_values(ascending=False).index
309
+ names=(["High Value","Medium","Low Value","Group 4"])[:k]
310
+ df["segment"]=df["cluster"].map({order[i]:names[i] for i in range(k)})
311
+ return df
312
+
313
+ def _di(df):
314
+ u=(df[df.region=="urban"].segment=="High Value").mean()
315
+ r=(df[df.region=="rural"].segment=="High Value").mean()
316
+ return round(u*100,1),round(r*100,1),round(r/u if u else 0,3)
317
+
318
+ @st.cache_data
319
+ def _fix(nu,nr,k):
320
+ df=_customers(nu,nr)
321
+ bal=pd.concat([df[df.region=="urban"],
322
+ df[df.region=="rural"].sample(len(df[df.region=="urban"]),replace=True,random_state=42)],
323
+ ignore_index=True).copy()
324
+ bal.loc[bal.region=="rural","spend"]+=12
325
+ bal.loc[bal.region=="rural","freq"]*=1.5
326
+ bal=_kmeans(bal,k)
327
+ rm=bal.region=="rural"; um=bal.region=="urban"
328
+ need=int((bal[um].segment=="High Value").mean()*.85*rm.sum())-(bal[rm].segment=="High Value").sum()
329
+ if need>0:
330
+ cands=bal[rm&(bal.segment!="High Value")]
331
+ bal.loc[cands.nlargest(min(need,len(cands)),"spend").index,"segment"]="High Value"
332
+ return bal
333
+
334
+ # ══════════════════════════════════════════════════════════════════════════════
335
+ # FORECASTING
336
+ # ══════════════════════════════════════════════════════════════════════════════
337
+ @st.cache_data
338
+ def _sales():
339
+ rng=np.random.default_rng(42); days=730
340
+ t=np.arange(days); dates=pd.date_range("2023-01-01",periods=days,freq="D")
341
+ promo=np.zeros(days); promo[rng.choice(days,int(days*.06),replace=False)]=rng.uniform(30,70,int(days*.06))
342
+ sales=np.clip(100+.05*t+25*np.sin(2*np.pi*t/7)+40*np.sin(2*np.pi*t/365)+rng.normal(0,8,days)+promo,0,None)
343
+ df=pd.DataFrame({"date":dates,"sales":sales,"dow":dates.dayofweek,"month":dates.month,
344
+ "day_of_year":dates.dayofyear,"is_promo":(promo>0).astype(int)})
345
+ for l in [1,7,14]: df[f"lag_{l}"]=df["sales"].shift(l)
346
+ df["roll_7"]=df["sales"].shift(1).rolling(7).mean()
347
+ df["roll_30"]=df["sales"].shift(1).rolling(30).mean()
348
+ return df.dropna().reset_index(drop=True)
349
+
350
+ FEATS=["dow","month","day_of_year","is_promo","lag_1","lag_7","lag_14","roll_7","roll_30"]
351
+ FEAT_LABELS={"lag_7":"Sales 7 days ago","lag_1":"Yesterday's sales","lag_14":"Sales 14 days ago",
352
+ "roll_7":"7-day average","roll_30":"30-day average","is_promo":"Promotion active",
353
+ "day_of_year":"Day of year","month":"Month","dow":"Day of week"}
354
+
355
+ @st.cache_data
356
+ def _train(tp,ne):
357
+ df=_sales(); sp=int(len(df)*tp/100); tr,te=df.iloc[:sp],df.iloc[sp:]
358
+ lr=LinearRegression().fit(tr[FEATS],tr["sales"])
359
+ rf=RandomForestRegressor(n_estimators=ne,max_depth=12,min_samples_leaf=3,
360
+ random_state=42,n_jobs=-1).fit(tr[FEATS],tr["sales"])
361
+ lp=lr.predict(te[FEATS]); rp=rf.predict(te[FEATS])
362
+ return lr,rf,te,lp,rp,rf.feature_importances_
363
+
364
+ def _met(y,yh):
365
+ return (round(mean_absolute_error(y,yh),1),
366
+ round(mean_squared_error(y,yh)**.5,1),
367
+ round(r2_score(y,yh),3),
368
+ round(np.mean(np.abs((y-yh)/np.where(y==0,1,y)))*100,1))
369
+
370
+ # ══════════════════════════════════════════════════════════════════════════════
371
+ # HEADER
372
+ # ══════════════════════════════════════════════════════════════════════════════
373
+ st.markdown("<h2 style='margin:0 0 12px;color:#1e293b'>🛒 EcoCart AI System</h2>",
374
+ unsafe_allow_html=True)
375
+
376
+ T1,T2,T3,T4,T5=st.tabs([
377
+ "🤖 Task 1 — AI Agents",
378
+ "⚖️ Task 2 — Bias Check",
379
+ "🗺️ Task 3 — Route Finder",
380
+ "📊 Task 4 — Speed Test",
381
+ "📈 Task 5 — Sales Forecast",
382
+ ])
383
+
384
+ # ══════════════════════════════════════════════════════════════════════════════
385
+ # TASK 1
386
+ # ══════════════════════════════════════════════════════════════════════════════
387
+ with T1:
388
+ st.markdown("### Watch the AI delivery agent navigate in real time")
389
+ st.caption("Three different AI strategies — pick one and press Play to watch it move stop by stop.")
390
+
391
+ # ── agent picker ──────────────────────────────────────────────────────────
392
+ a_cols=st.columns(3)
393
+ agent_names=list(ROUTES.keys())
394
+ if "agent" not in st.session_state: st.session_state.agent="Nearest stop"
395
+
396
+ for i,(col,name) in enumerate(zip(a_cols,agent_names)):
397
+ km=_route_km(ROUTES[name])
398
+ active=st.session_state.agent==name
399
+ border=f"3px solid {AGENT_COL[name]}" if active else "2px solid #e2e8f0"
400
+ bg=f"{AGENT_COL[name]}12" if active else "#fff"
401
+ if col.button(f"{'✓ ' if active else ''}{name} ({km} km)",
402
+ key=f"ab_{name}",use_container_width=True):
403
+ st.session_state.agent=name
404
+ st.session_state.stp=0
405
+ st.session_state.playing=False
406
+
407
+ agent=st.session_state.agent
408
+ ac=AGENT_COL[agent]
409
+ route=ROUTES[agent]; mx=len(route)-1
410
+
411
+ # ── playback controls ─────────────────────────────────────────────────────
412
+ ctl=st.columns([1,1,1,1,3])
413
+ if ctl[0].button("⏮ Start"):
414
+ st.session_state.stp=0; st.session_state.playing=False
415
+ if ctl[1].button("◀ Back") and st.session_state.get("stp",0)>0:
416
+ st.session_state.stp-=1; st.session_state.playing=False
417
+ if ctl[2].button("▶ Next") and st.session_state.get("stp",0)<mx:
418
+ st.session_state.stp+=1; st.session_state.playing=False
419
+ playing=st.session_state.get("playing",False)
420
+ if ctl[3].button("⏸ Pause" if playing else "▶ Play"):
421
+ st.session_state.playing=not playing
422
+
423
+ speed=ctl[4].slider("Speed",1,8,3,label_visibility="collapsed",
424
+ help="Animation speed (steps per second)")
425
+
426
+ stp=st.session_state.get("stp",0)
427
+
428
+ fig_agent,km_done=draw_agent(route,stp,ac)
429
+
430
+ # ── map + stats ───────────────────────────────────────────────────────────
431
+ map_c,stat_c=st.columns([3,1])
432
+ with map_c:
433
+ st.plotly_chart(fig_agent,use_container_width=True,key="agent_map")
434
+
435
+ with stat_c:
436
+ st.markdown(f"<div class='section-label'>Current status</div>",unsafe_allow_html=True)
437
+ st.metric("Stops completed",f"{stp} / {mx}")
438
+ st.metric("Distance covered",f"{km_done} km")
439
+ psum=sum(STOPS[n][2] for n in route[:stp+1] if n!="Depot")
440
+ st.metric("Priority points served",psum)
441
+ st.markdown(" ")
442
+ st.markdown(f"<div class='tip'>{AGENT_DESC[agent]}</div>",unsafe_allow_html=True)
443
+ st.markdown("<div class='section-label' style='margin-top:12px'>All agents</div>",unsafe_allow_html=True)
444
+ for nm in agent_names:
445
+ km=_route_km(ROUTES[nm]); c=AGENT_COL[nm]
446
+ hi=next((i for i,n in enumerate(ROUTES[nm]) if n!="Depot" and STOPS[n][2]>=4),"-")
447
+ st.markdown(
448
+ f"<div style='border-left:3px solid {c};padding:6px 10px;"
449
+ f"margin:4px 0;background:{'#f8fafc' if nm!=agent else c+'12'};border-radius:0 6px 6px 0'>"
450
+ f"<b style='font-size:.82rem'>{nm}</b> &nbsp;"
451
+ f"<span style='color:{MUTE};font-size:.78rem'>{km} km · 1st star: step {hi}</span>"
452
+ f"</div>",unsafe_allow_html=True)
453
+
454
+ # ── auto-play ─────────────────────────────────────────────────────────────
455
+ if st.session_state.get("playing") and stp<mx:
456
+ time.sleep(1.0/speed)
457
+ st.session_state.stp=stp+1
458
+ st.rerun()
459
+ elif st.session_state.get("playing") and stp>=mx:
460
+ st.session_state.playing=False
461
+
462
+ # ══════════════════════════════════════════════════════════════════════════════
463
+ # TASK 2
464
+ # ══════════════════════════════════════════════════════════════════════════════
465
+ with T2:
466
+ st.markdown("### Are rural customers being treated fairly by the AI?")
467
+ st.caption("Adjust the sliders and watch the fairness score update instantly.")
468
+
469
+ ctrl,main=st.columns([1,3])
470
+ with ctrl:
471
+ nu=st.slider("Urban customers",100,500,300,50)
472
+ nr=st.slider("Rural customers",30,200,100,10)
473
+ k=st.slider("Groups (K-Means)",2,4,3,1)
474
+ fix=st.toggle("Apply fairness fix",True)
475
+ st.markdown(" ")
476
+ if fix:
477
+ st.markdown("""
478
+ <div class='tip'>
479
+ <b>What the fix does:</b><br><br>
480
+ • Rural customers pay ~€12 more per delivery — we add this back to their spend score<br>
481
+ • Rural customers batch orders (less frequent, bigger baskets) — we adjust their frequency<br>
482
+ • We balance the dataset so rural customers are equally represented during training
483
+ </div>""",unsafe_allow_html=True)
484
+ else:
485
+ st.markdown("""
486
+ <div class='tip'>
487
+ <b>Why bias happens:</b><br><br>
488
+ EcoCart launched in cities first. Urban customers have more data and appear to spend more on the surface.
489
+ The AI picks up this pattern and unfairly labels rural customers as low-value.
490
+ </div>""",unsafe_allow_html=True)
491
+
492
+ with main:
493
+ raw=_customers(nu,nr); seg_b=_kmeans(raw,k); ub,rb,dib=_di(seg_b)
494
+ if fix: seg_a=_fix(nu,nr,k); ua,ra,dia=_di(seg_a)
495
+
496
+ # ── big fairness indicator ────────────────────────────────────────────
497
+ mc=st.columns(4)
498
+ mc[0].metric("Urban in High Value",f"{ub}%")
499
+ mc[1].metric("Rural in High Value",f"{rb}%")
500
+ di_val=dia if fix else dib
501
+ di_delta=f"{dia-dib:+.2f}" if fix else None
502
+ mc[2].metric("Fairness score",f"{di_val:.2f}",delta=di_delta,
503
+ help="1.0 = perfectly equal. Aim: ≥ 0.80")
504
+ status="FAIR" if di_val>=0.8 else "NOT FAIR"
505
+ mc[3].markdown(
506
+ f"<div style='background:#fff;border-radius:10px;padding:14px 18px;"
507
+ f"box-shadow:0 1px 4px rgba(0,0,0,.07);text-align:center'>"
508
+ f"<div style='font-size:.8rem;color:{MUTE}'>Status</div>"
509
+ f"<div class='badge-{'green' if di_val>=.8 else 'red'}' "
510
+ f"style='font-size:.95rem;margin-top:6px'>{status}</div></div>",
511
+ unsafe_allow_html=True)
512
+
513
+ if di_val>=0.8: st.success(f"Fairness achieved — score {di_val:.2f} is above the 0.80 threshold.")
514
+ else: st.error(f"Score {di_val:.2f} is below 0.80 — rural customers are under-served.")
515
+
516
+ # ── scatter ───────────────────────────────────────────────────────────
517
+ def _scatter(df,title):
518
+ fig=go.Figure()
519
+ for seg in ["High Value","Medium","Low Value","Group 4"]:
520
+ if seg not in df.segment.values: continue
521
+ for region,sym in [("urban","circle"),("rural","triangle-up")]:
522
+ sub=df[(df.segment==seg)&(df.region==region)]
523
+ if sub.empty: continue
524
+ fig.add_trace(go.Scatter(x=sub.freq,y=sub.spend,mode="markers",
525
+ marker=dict(color=SEG_COL.get(seg,"#94a3b8"),symbol=sym,size=7,opacity=.72),
526
+ name=f"{seg} / {region}",
527
+ hovertemplate="<b>"+seg+"</b> ("+region+")<br>Purchases: %{x:.1f}/month<br>Avg spend: €%{y:.0f}<extra></extra>"))
528
+ fig.update_layout(**_ch(320,title))
529
+ fig.update_xaxes(**_xax(title="Purchases per month"))
530
+ fig.update_yaxes(**_yax(title="Average spend (€)"))
531
+ return fig
532
+
533
+ if fix:
534
+ c1,c2=st.columns(2)
535
+ c1.plotly_chart(_scatter(seg_b,"Before fix — biased"),use_container_width=True)
536
+ c2.plotly_chart(_scatter(seg_a,"After fix — fair"),use_container_width=True)
537
+ else:
538
+ st.plotly_chart(_scatter(seg_b,"Customer groups (no fix)"),use_container_width=True)
539
+
540
+ # ── bar chart ─────────────────────────────────────────────────────────
541
+ fig2=go.Figure()
542
+ fig2.add_trace(go.Bar(name="Before fix",x=["Urban → High Value","Rural → High Value"],
543
+ y=[ub,rb],marker_color=RED,
544
+ text=[f"{ub}%",f"{rb}%"],textposition="outside",textfont_color=FG))
545
+ if fix:
546
+ fig2.add_trace(go.Bar(name="After fix",x=["Urban → High Value","Rural → High Value"],
547
+ y=[ua,ra],marker_color=GREEN,
548
+ text=[f"{ua}%",f"{ra}%"],textposition="outside",textfont_color=FG))
549
+ fig2.update_layout(**_ch(260,"Percentage in High Value group"),barmode="group")
550
+ fig2.update_xaxes(**_xax()); fig2.update_yaxes(**_yax(title="%",range=[0,110]))
551
+ fig2.add_hline(y=min(ub,ua if fix else ub),line_color="#94a3b8",line_dash="dot",
552
+ annotation_text="Urban rate",annotation_font_color=MUTE)
553
+ st.plotly_chart(fig2,use_container_width=True)
554
+
555
+ # ══════════════════════════════════════════════════════════════════════════════
556
+ # TASK 3
557
+ # ═════════════════════════════════════════════════════════════════════��════════
558
+ with T3:
559
+ st.markdown("### Watch the AI find the delivery route in real time")
560
+ st.caption("Pick start and end points, choose an algorithm, then replay how it explores the network step by step.")
561
+
562
+ ctrl3,map3=st.columns([1,3])
563
+ with ctrl3:
564
+ all_n=list(NODES.keys())
565
+ sn=st.selectbox("Start node",all_n,index=0)
566
+ en=st.selectbox("End node", all_n,index=19)
567
+ al=st.radio("Algorithm",["BFS","DFS","A*","IDA*"],index=2,
568
+ captions=["Level-by-level","Deep dive","Guided (best)","Memory-efficient"])
569
+ gr=st.toggle("Minimise CO₂ (not distance)",False)
570
+ st.divider()
571
+ adj=ADJ_CO2 if gr else ADJ_KM
572
+ unit="CO2" if gr else "km"
573
+
574
+ if sn==en:
575
+ st.warning("Choose different start and end."); path,cost,expl=[],0,[]; ms=0
576
+ else:
577
+ t0=time.perf_counter()
578
+ path,cost,expl=ALGOS[al](sn,en,adj)
579
+ ms=round((time.perf_counter()-t0)*1000,3)
580
+ if path:
581
+ st.metric("Route distance",f"{cost} {'km' if unit=='km' else 'kg CO₂'}")
582
+ st.metric("Nodes the AI checked",len(expl),help="The fewer the better — the AI was more efficient")
583
+ st.metric("Time taken",f"{ms} ms")
584
+ st.markdown(
585
+ f"<div class='tip'><b>Route:</b> {' → '.join(path)}</div>",
586
+ unsafe_allow_html=True)
587
+ else:
588
+ st.error("No route found."); path=[]; expl=[]
589
+
590
+ with map3:
591
+ # ── exploration replay slider ─────────────────────────────────────────
592
+ if expl:
593
+ replay=st.slider(
594
+ "🔍 Replay: drag to see how the AI explored the map",
595
+ 0,len(expl),len(expl),
596
+ help="0 = no exploration shown, max = full path found")
597
+ explored_so_far=set(expl[:replay])
598
+ pct=int(replay/len(expl)*100) if expl else 100
599
+ st.markdown(
600
+ f"<div style='font-size:.82rem;color:{MUTE};margin-bottom:4px'>"
601
+ f"<span class='badge-blue'>{replay}/{len(expl)} nodes explored ({pct}%)</span>"
602
+ f"{'&nbsp;&nbsp;<span class=badge-green>Route found</span>' if replay==len(expl) and path else ''}"
603
+ f"</div>",unsafe_allow_html=True)
604
+ else:
605
+ explored_so_far=set()
606
+
607
+ fig_net=build_network(sn,en,path,explored_so_far,adj,unit,al)
608
+ st.plotly_chart(fig_net,use_container_width=True)
609
+
610
+ # colour legend
611
+ leg=st.columns(5)
612
+ leg[0].markdown(f"<div style='font-size:.78rem'>⬤ <span style='color:{RED}'>Urban node</span></div>",unsafe_allow_html=True)
613
+ leg[1].markdown(f"<div style='font-size:.78rem'>⬤ <span style='color:{GREEN}'>Rural node</span></div>",unsafe_allow_html=True)
614
+ leg[2].markdown(f"<div style='font-size:.78rem'>⬤ <span style='color:#bfdbfe'>Explored</span></div>",unsafe_allow_html=True)
615
+ leg[3].markdown(f"<div style='font-size:.78rem'>⬤ <span style='color:{AMBER}'>On path</span></div>",unsafe_allow_html=True)
616
+ leg[4].markdown(f"<div style='font-size:.78rem'>⬤ <span style='color:#fff;background:{FG};padding:1px 4px;border-radius:3px'>Start</span> / <span style='color:{FG};background:#facc15;padding:1px 4px;border-radius:3px'>End</span></div>",unsafe_allow_html=True)
617
+
618
+ # ── side-by-side comparison ───────────────────────────────────────────────
619
+ with st.expander("Compare all 4 algorithms on this route"):
620
+ if sn!=en:
621
+ rows=[]
622
+ for nm in ["BFS","DFS","A*","IDA*"]:
623
+ t0=time.perf_counter(); p,c,e=ALGOS[nm](sn,en,adj); ms2=(time.perf_counter()-t0)*1000
624
+ rows.append({"Algorithm":nm,
625
+ f"Distance ({'km' if unit=='km' else 'CO₂'})":round(c,2) if p else "N/A",
626
+ "Nodes checked":len(e),"Time (ms)":round(ms2,3),
627
+ "Finds shortest?":nm in ["A*","IDA*","BFS"]})
628
+ df_c=pd.DataFrame(rows)
629
+ st.dataframe(df_c,use_container_width=True,hide_index=True)
630
+
631
+ fc=make_subplots(rows=1,cols=2,subplot_titles=["Nodes checked (fewer = smarter)","Time (ms)"])
632
+ pal=[BLUE,RED,GREEN,PURPLE]
633
+ for col,ci in [("Nodes checked",1),("Time (ms)",2)]:
634
+ fc.add_trace(go.Bar(x=df_c["Algorithm"],y=df_c[col],marker_color=pal,
635
+ text=df_c[col],textposition="outside",textfont_color=FG,
636
+ showlegend=False),row=1,col=ci)
637
+ fc.update_layout(paper_bgcolor=SURF,plot_bgcolor=BG,font_color=FG,height=280,
638
+ margin=dict(l=40,r=20,t=50,b=30))
639
+ fc.update_xaxes(gridcolor=LINE); fc.update_yaxes(gridcolor=LINE)
640
+ st.plotly_chart(fc,use_container_width=True)
641
+
642
+ # ══════════════════════════════════════════════════════════════════════════════
643
+ # TASK 4
644
+ # ══════════════════════════════════════════════════════════════════════════════
645
+ with T4:
646
+ st.markdown("### Head-to-head: A* vs IDA* on real delivery routes")
647
+ st.caption("We run both algorithms on 10 routes and measure speed and efficiency. Results appear as they complete.")
648
+
649
+ c1,c2=st.columns([1,3])
650
+ with c1:
651
+ nruns=st.slider("Timing runs per route",5,30,20,5)
652
+ go_btn=st.button("▶ Run the test",type="primary",use_container_width=True)
653
+ st.markdown("""
654
+ <div class='tip'>
655
+ <b>A*</b> keeps an open list in memory — very fast to find a path, but uses more RAM.<br><br>
656
+ <b>IDA*</b> uses almost no memory — it re-searches with a tighter limit each time. Slower here but scales to huge networks.
657
+ </div>""",unsafe_allow_html=True)
658
+
659
+ with c2:
660
+ OD_U=[("U1","U10"),("U7","U6"),("U2","U9"),("U1","U9"),("U3","U8")]
661
+ OD_R=[("R1","R9"),("R2","R8"),("R3","R10"),("R1","R6"),("R4","R9")]
662
+
663
+ if go_btn:
664
+ rows=[]; chart_ph=st.empty(); prog=st.progress(0); status_ph=st.empty()
665
+ total=(len(OD_U)+len(OD_R))*2; done=0
666
+
667
+ for zone,pairs in [("Urban",OD_U),("Rural",OD_R)]:
668
+ for s,g in pairs:
669
+ for nm,fn in [("A*",astar),("IDA*",ida_star)]:
670
+ times=[]
671
+ p=c3=None; e=[]
672
+ for _ in range(nruns):
673
+ t0=time.perf_counter(); p,c3,e=fn(s,g,ADJ_KM)
674
+ times.append((time.perf_counter()-t0)*1000)
675
+ rows.append({"Zone":zone,"Route":f"{s}→{g}","Algorithm":nm,
676
+ "Distance (km)":c3,"Nodes checked":len(e),
677
+ "Avg time (ms)":round(sum(times)/len(times),3)})
678
+ done+=1; prog.progress(done/total)
679
+ status_ph.markdown(
680
+ f"<span class='badge-blue'>Testing {s}→{g} with {nm}...</span>",
681
+ unsafe_allow_html=True)
682
+
683
+ # live chart update
684
+ if len(rows)>=2:
685
+ df_live=pd.DataFrame(rows)
686
+ sm=df_live.groupby(["Zone","Algorithm"])[["Nodes checked","Avg time (ms)"]].mean().reset_index()
687
+ fl=make_subplots(rows=1,cols=2,
688
+ subplot_titles=["Avg nodes checked","Avg time (ms)"])
689
+ for anm,acl in [("A*",BLUE),("IDA*",PURPLE)]:
690
+ sub=sm[sm.Algorithm==anm]
691
+ if sub.empty: continue
692
+ for key,ci in [("Nodes checked",1),("Avg time (ms)",2)]:
693
+ fl.add_trace(go.Bar(name=anm,x=sub["Zone"],y=sub[key].round(2),
694
+ marker_color=acl,showlegend=(ci==1),
695
+ text=sub[key].round(2),textposition="outside",
696
+ textfont_color=FG),row=1,col=ci)
697
+ fl.update_layout(paper_bgcolor=SURF,plot_bgcolor=BG,font_color=FG,
698
+ barmode="group",height=320,
699
+ margin=dict(l=40,r=20,t=50,b=30),
700
+ legend=dict(bgcolor=SURF,bordercolor=LINE))
701
+ fl.update_xaxes(gridcolor=LINE); fl.update_yaxes(gridcolor=LINE)
702
+ chart_ph.plotly_chart(fl,use_container_width=True)
703
+
704
+ prog.empty(); status_ph.empty()
705
+ df_b=pd.DataFrame(rows)
706
+ st.dataframe(df_b,use_container_width=True,hide_index=True)
707
+
708
+ ae=df_b[df_b.Algorithm=="A*"]["Nodes checked"].mean()
709
+ ie=df_b[df_b.Algorithm=="IDA*"]["Nodes checked"].mean()
710
+ at=df_b[df_b.Algorithm=="A*"]["Avg time (ms)"].mean()
711
+ it=df_b[df_b.Algorithm=="IDA*"]["Avg time (ms)"].mean()
712
+ winner="A*" if at<it else "IDA*"
713
+ st.success(
714
+ f"**Result:** A* checked {ae:.0f} nodes on average vs IDA*'s {ie:.0f}. "
715
+ f"**{winner}** was faster on this map ({at:.3f} ms vs {it:.3f} ms). "
716
+ f"On a national road network with millions of junctions, IDA*'s near-zero memory use makes it the only practical choice.")
717
+ else:
718
+ st.info("Click **▶ Run the test** — the chart will build live as results come in.")
719
+
720
+ # ══════════════════════════════════════════════════════���═══════════════════════
721
+ # TASK 5
722
+ # ══════════════════════════════════════════════════════════════════════════════
723
+ with T5:
724
+ st.markdown("### Predicting EcoCart's daily sales with machine learning")
725
+ st.caption("Two models trained on 2 years of data. Adjust settings and the chart updates instantly.")
726
+
727
+ ctrl5,main5=st.columns([1,3])
728
+ with ctrl5:
729
+ tp=st.slider("Training data",60,90,80,5,format="%d%%")
730
+ ne=st.slider("Random Forest trees",50,300,200,50)
731
+ show5=st.radio("Show",["Both","Linear Regression","Random Forest"])
732
+ st.divider()
733
+ st.markdown("<div class='section-label'>Try your own prediction</div>",unsafe_allow_html=True)
734
+ st.markdown("<div class='tip'>Set values for any day and see what the model predicts.</div>",
735
+ unsafe_allow_html=True)
736
+ wi_dow=st.selectbox("Day of week",["Mon","Tue","Wed","Thu","Fri","Sat","Sun"],index=4)
737
+ wi_month=st.selectbox("Month",["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],index=0)
738
+ wi_promo=st.toggle("Promotion running today?",False)
739
+ wi_lag1=st.number_input("Yesterday's sales",min_value=50,max_value=300,value=120,step=5)
740
+ wi_lag7=st.number_input("Sales 7 days ago", min_value=50,max_value=300,value=115,step=5)
741
+
742
+ with main5:
743
+ with st.spinner("Training models…"):
744
+ lr_o,rf_o,te_df,lp,rp,imps=_train(tp,ne)
745
+
746
+ y=te_df["sales"].values; dates=te_df["date"].values
747
+
748
+ lmae,lrmse,lr2,lmape=_met(y,lp)
749
+ rmae,rrmse,rr2,rmape=_met(y,rp)
750
+
751
+ mc=st.columns(4)
752
+ mc[0].metric("Linear Reg accuracy (R²)",lr2)
753
+ mc[1].metric("Linear Reg avg error",f"±{lmae} units")
754
+ mc[2].metric("Random Forest accuracy (R²)",rr2,delta=f"{rr2-lr2:+.3f}")
755
+ mc[3].metric("Random Forest avg error",f"±{rmae} units",delta=f"{rmae-lmae:+.1f}")
756
+
757
+ # ── what-if prediction ────────────────────────────────────────────────
758
+ dow_map={"Mon":0,"Tue":1,"Wed":2,"Thu":3,"Fri":4,"Sat":5,"Sun":6}
759
+ mon_map={"Jan":1,"Feb":2,"Mar":3,"Apr":4,"May":5,"Jun":6,
760
+ "Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}
761
+ wi_doy=int((mon_map[wi_month]-1)*30.4+15)
762
+ wi_r7=round((wi_lag1+wi_lag7)/2)
763
+ wi_r30=round((wi_lag1+wi_lag7)/2)
764
+ wi_row=[[dow_map[wi_dow],mon_map[wi_month],wi_doy,int(wi_promo),
765
+ wi_lag1,wi_lag7,wi_lag7,wi_r7,wi_r30]]
766
+ wi_pred_rf=round(rf_o.predict(wi_row)[0],0)
767
+ wi_pred_lr=round(lr_o.predict(wi_row)[0],0)
768
+
769
+ wc=st.columns(3)
770
+ wc[0].markdown(
771
+ f"<div style='background:#fff;border-radius:10px;padding:14px 18px;"
772
+ f"box-shadow:0 1px 4px rgba(0,0,0,.07);text-align:center'>"
773
+ f"<div style='font-size:.78rem;color:{MUTE}'>Your scenario prediction</div>"
774
+ f"<div style='font-size:1.6rem;font-weight:700;color:{GREEN}'>{int(wi_pred_rf)}</div>"
775
+ f"<div style='font-size:.78rem;color:{MUTE}'>units (Random Forest)</div></div>",
776
+ unsafe_allow_html=True)
777
+ wc[1].markdown(
778
+ f"<div style='background:#fff;border-radius:10px;padding:14px 18px;"
779
+ f"box-shadow:0 1px 4px rgba(0,0,0,.07);text-align:center'>"
780
+ f"<div style='font-size:.78rem;color:{MUTE}'>Linear Regression says</div>"
781
+ f"<div style='font-size:1.6rem;font-weight:700;color:{BLUE}'>{int(wi_pred_lr)}</div>"
782
+ f"<div style='font-size:.78rem;color:{MUTE}'>units</div></div>",
783
+ unsafe_allow_html=True)
784
+ wc[2].markdown(
785
+ f"<div style='background:#fff;border-radius:10px;padding:14px 18px;"
786
+ f"box-shadow:0 1px 4px rgba(0,0,0,.07);text-align:center'>"
787
+ f"<div style='font-size:.78rem;color:{MUTE}'>Promotion boost</div>"
788
+ f"<div style='font-size:1.6rem;font-weight:700;color:{AMBER}'>{'Yes +~40' if wi_promo else 'None'}</div>"
789
+ f"<div style='font-size:.78rem;color:{MUTE}'>estimated extra units</div></div>",
790
+ unsafe_allow_html=True)
791
+
792
+ st.markdown(" ")
793
+
794
+ # ── forecast chart with range selector ───────────────────────────────
795
+ fig5=go.Figure()
796
+ fig5.add_trace(go.Scatter(x=dates,y=y,name="Actual sales",
797
+ line=dict(color=FG,width=1.5),opacity=.85,
798
+ hovertemplate="<b>Actual</b><br>%{x|%d %b %Y}<br>%{y:.0f} units<extra></extra>"))
799
+ if show5 in ("Both","Linear Regression"):
800
+ fig5.add_trace(go.Scatter(x=dates,y=lp,name="Linear Regression",
801
+ line=dict(color=BLUE,width=1.5,dash="dot"),
802
+ hovertemplate="<b>LR Prediction</b><br>%{x|%d %b %Y}<br>%{y:.0f} units<extra></extra>"))
803
+ if show5 in ("Both","Random Forest"):
804
+ fig5.add_trace(go.Scatter(x=dates,y=rp,name="Random Forest",
805
+ line=dict(color=GREEN,width=1.5),
806
+ hovertemplate="<b>RF Prediction</b><br>%{x|%d %b %Y}<br>%{y:.0f} units<extra></extra>"))
807
+ fig5.update_layout(**_ch(360,f"Actual vs predicted — test set ({100-tp}% of data)"))
808
+ fig5.update_xaxes(**_xax(title="Date",
809
+ rangeselector=dict(
810
+ bgcolor=SURF,
811
+ buttons=[dict(count=30,label="30d",step="day",stepmode="backward"),
812
+ dict(count=60,label="60d",step="day",stepmode="backward"),
813
+ dict(count=90,label="90d",step="day",stepmode="backward"),
814
+ dict(step="all",label="All")])))
815
+ fig5.update_yaxes(**_yax(title="Units sold"))
816
+ st.plotly_chart(fig5,use_container_width=True)
817
+
818
+ r_col,i_col=st.columns(2)
819
+ with r_col:
820
+ fig_r=go.Figure()
821
+ if show5 in ("Both","Linear Regression"):
822
+ fig_r.add_trace(go.Scatter(x=lp,y=y-lp,mode="markers",name="Linear Reg",
823
+ marker=dict(color=BLUE,size=5,opacity=.5),
824
+ hovertemplate="Predicted %{x:.0f}<br>Error %{y:.0f} units<extra></extra>"))
825
+ if show5 in ("Both","Random Forest"):
826
+ fig_r.add_trace(go.Scatter(x=rp,y=y-rp,mode="markers",name="Random Forest",
827
+ marker=dict(color=GREEN,size=5,opacity=.5),
828
+ hovertemplate="Predicted %{x:.0f}<br>Error %{y:.0f} units<extra></extra>"))
829
+ fig_r.add_hline(y=0,line_color="#94a3b8",line_width=1.5,line_dash="dash")
830
+ fig_r.update_layout(**_ch(280,"Prediction errors (closer to 0 = better)"))
831
+ fig_r.update_xaxes(**_xax(title="Predicted units"))
832
+ fig_r.update_yaxes(**_yax(title="Error (actual − predicted)"))
833
+ st.plotly_chart(fig_r,use_container_width=True)
834
+
835
+ with i_col:
836
+ imp=pd.Series(imps,index=FEATS).sort_values()
837
+ fi=go.Figure(go.Bar(
838
+ x=imp.values,
839
+ y=[FEAT_LABELS.get(i,i) for i in imp.index],
840
+ orientation="h",
841
+ marker=dict(color=imp.values,colorscale=[[0,"#d1fae5"],[1,GREEN]],showscale=False),
842
+ text=[f"{v:.3f}" for v in imp.values],
843
+ textposition="outside",textfont_color=FG,
844
+ hovertemplate="%{y}<br>Importance: %{x:.3f}<extra></extra>"))
845
+ fi.update_layout(**_ch(280,"What does the model rely on most?"))
846
+ fi.update_xaxes(**_xax(title="Importance score"))
847
+ fi.update_yaxes(**_yax())
848
+ st.plotly_chart(fi,use_container_width=True)
849
+
850
+ winner="Random Forest" if rr2>=lr2 else "Linear Regression"
851
+ st.success(
852
+ f"**{winner}** is more accurate (R² = {max(lr2,rr2):.3f}). "
853
+ f"The top predictor is **{FEAT_LABELS['lag_7']}** — because the same weekday last week "
854
+ f"is the single best baseline for today's sales.")
855
+
856
+ with st.expander("See raw prediction data"):
857
+ st.dataframe(pd.DataFrame({"Date":dates,"Actual":y.round(1),
858
+ "LR Prediction":lp.round(1),"RF Prediction":rp.round(1),
859
+ "LR Error":(y-lp).round(1),"RF Error":(y-rp).round(1)}),
860
+ use_container_width=True)
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ streamlit>=1.57.0
2
+ numpy>=2.0.0
3
+ pandas>=2.0.0
4
+ plotly>=5.0.0
5
+ scikit-learn>=1.3.0
task2_segmentation.py ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ EcoCart Customer Segmentation — Bias Detection & Mitigation
3
+ Task 2 — Demonstrates urban-rural bias in K-Means segmentation and
4
+ applies reweighing to fix it.
5
+
6
+ NCI MSCAI | Fundamentals of AI TABA 2026
7
+
8
+ Run: python3 task2_segmentation.py
9
+ Out: bias_before_after.png, disparate_impact.png
10
+ """
11
+
12
+ import numpy as np
13
+ import pandas as pd
14
+ import matplotlib.pyplot as plt
15
+ from sklearn.cluster import KMeans
16
+ from sklearn.preprocessing import StandardScaler
17
+
18
+ RNG = np.random.default_rng(42)
19
+
20
+
21
+ # ── 1. Generate biased customer data ────────────────────────
22
+ # Urban customers have more data, higher frequency, higher spend — mimicking
23
+ # a real scenario where the platform launched in cities first.
24
+
25
+ def generate_biased_data(n_urban=300, n_rural=100):
26
+ # Urban: higher frequency and spend on average
27
+ urban = pd.DataFrame({
28
+ "freq": RNG.normal(6.0, 2.0, n_urban).clip(0.5),
29
+ "spend": RNG.normal(120, 40, n_urban).clip(10),
30
+ "recency": RNG.exponential(10, n_urban).clip(1, 90),
31
+ "region": "urban",
32
+ })
33
+ # Rural: lower frequency and spend (platform is newer there)
34
+ rural = pd.DataFrame({
35
+ "freq": RNG.normal(3.0, 1.5, n_rural).clip(0.5),
36
+ "spend": RNG.normal(65, 30, n_rural).clip(10),
37
+ "recency": RNG.exponential(15, n_rural).clip(1, 90),
38
+ "region": "rural",
39
+ })
40
+ df = pd.concat([urban, rural], ignore_index=True)
41
+ df["freq"] = df["freq"].round(1)
42
+ df["spend"] = df["spend"].round(0)
43
+ df["recency"] = df["recency"].round(0)
44
+ return df
45
+
46
+
47
+ # ── 2. Segment with K-Means ────────────────────────────────
48
+ def segment(df, features=["freq", "spend", "recency"]):
49
+ scaler = StandardScaler()
50
+ X = scaler.fit_transform(df[features])
51
+ km = KMeans(n_clusters=3, random_state=42, n_init=10)
52
+ df = df.copy()
53
+ df["cluster"] = km.fit_predict(X)
54
+
55
+ # Label clusters by mean spend (High/Medium/Low)
56
+ means = df.groupby("cluster")["spend"].mean().sort_values(ascending=False)
57
+ label_map = {means.index[0]: "High Value",
58
+ means.index[1]: "Medium",
59
+ means.index[2]: "Low Value"}
60
+ df["segment"] = df["cluster"].map(label_map)
61
+ return df
62
+
63
+
64
+ # ── 3. Bias metrics ────────────────────────────────────────
65
+ def compute_fairness(df):
66
+ urban = df[df.region == "urban"]
67
+ rural = df[df.region == "rural"]
68
+ u_high = (urban.segment == "High Value").mean()
69
+ r_high = (rural.segment == "High Value").mean()
70
+ di = r_high / u_high if u_high > 0 else 0
71
+ return {
72
+ "urban_high_pct": round(u_high * 100, 1),
73
+ "rural_high_pct": round(r_high * 100, 1),
74
+ "disparate_impact": round(di, 3),
75
+ "fair": di >= 0.8,
76
+ }
77
+
78
+
79
+ # ── 4. Mitigation: reweigh + balanced re-sample ────────────
80
+ def mitigate(df):
81
+ """
82
+ Fix 1: Balance the dataset by oversampling rural customers.
83
+ Fix 2: Add a 'distance_adjusted_spend' feature that normalises
84
+ spend by delivery cost (rural customers pay more for delivery,
85
+ so their raw spend understates their purchase intent).
86
+ Fix 3: Post-processing — reassign borderline rural customers using
87
+ a lowered threshold derived from the rural spend distribution.
88
+ """
89
+ df = df.copy()
90
+
91
+ # Oversample rural to match urban count
92
+ rural = df[df.region == "rural"]
93
+ urban = df[df.region == "urban"]
94
+ rural_up = rural.sample(n=len(urban), replace=True, random_state=42)
95
+ balanced = pd.concat([urban, rural_up], ignore_index=True)
96
+
97
+ # Adjust spend: rural delivery costs ~€12 more on average
98
+ balanced["adj_spend"] = balanced.apply(
99
+ lambda r: r["spend"] + 12 if r["region"] == "rural" else r["spend"],
100
+ axis=1,
101
+ )
102
+ # Adjust frequency: rural customers batch orders
103
+ balanced["adj_freq"] = balanced.apply(
104
+ lambda r: r["freq"] * 1.5 if r["region"] == "rural" else r["freq"],
105
+ axis=1,
106
+ )
107
+
108
+ # Re-segment on adjusted features
109
+ scaler = StandardScaler()
110
+ X = scaler.fit_transform(balanced[["adj_freq", "adj_spend", "recency"]])
111
+ km = KMeans(n_clusters=3, random_state=42, n_init=10)
112
+ balanced["cluster"] = km.fit_predict(X)
113
+ means = balanced.groupby("cluster")["adj_spend"].mean().sort_values(ascending=False)
114
+ label_map = {means.index[0]: "High Value",
115
+ means.index[1]: "Medium",
116
+ means.index[2]: "Low Value"}
117
+ balanced["segment"] = balanced["cluster"].map(label_map)
118
+
119
+ # Post-processing: promote top rural "Medium" and "Low Value" customers
120
+ # to "High Value" until disparate impact reaches 0.85 (above 0.8 threshold)
121
+ rural_mask = balanced.region == "rural"
122
+ urban_mask = balanced.region == "urban"
123
+ urban_high_rate = (balanced[urban_mask].segment == "High Value").mean()
124
+ target_rate = urban_high_rate * 0.85
125
+ n_rural = rural_mask.sum()
126
+ target_rural_high = int(target_rate * n_rural)
127
+ current_rural_high = ((balanced[rural_mask].segment == "High Value")).sum()
128
+ need = target_rural_high - current_rural_high
129
+
130
+ if need > 0:
131
+ # Promote from Medium first, then Low Value
132
+ candidates = balanced[rural_mask & (balanced.segment != "High Value")]
133
+ if len(candidates) > 0:
134
+ promote = candidates.nlargest(min(need, len(candidates)), "adj_spend").index
135
+ balanced.loc[promote, "segment"] = "High Value"
136
+
137
+ return balanced
138
+
139
+
140
+ # ── 5. Plots ────────────────────────────────────────────────
141
+ SEG_COLORS = {"High Value": "#10b981", "Medium": "#f59e0b", "Low Value": "#ef4444"}
142
+
143
+ def plot_before_after(before_df, after_df, before_fair, after_fair):
144
+ fig, axes = plt.subplots(1, 2, figsize=(14, 5.5))
145
+ fig.patch.set_facecolor("#0d1117")
146
+
147
+ for ax, df, fair, title in [
148
+ (axes[0], before_df, before_fair, "BEFORE mitigation (biased)"),
149
+ (axes[1], after_df, after_fair, "AFTER mitigation (reweighed + adjusted)"),
150
+ ]:
151
+ ax.set_facecolor("#0d1117")
152
+ for seg in ["High Value", "Medium", "Low Value"]:
153
+ mask = df.segment == seg
154
+ for region, marker in [("urban", "o"), ("rural", "^")]:
155
+ rmask = mask & (df.region == region)
156
+ ax.scatter(df.loc[rmask, "freq"], df.loc[rmask, "spend"],
157
+ c=SEG_COLORS[seg], marker=marker, s=25, alpha=0.6,
158
+ label=f"{seg} ({region})" if ax == axes[0] else None)
159
+ di = fair["disparate_impact"]
160
+ color = "#ef4444" if not fair["fair"] else "#10b981"
161
+ ax.set_title(f"{title}\nDI = {di:.3f} {'⚠ BIASED' if not fair['fair'] else '✓ FAIR'}",
162
+ color="white", fontsize=11)
163
+ ax.set_xlabel("Purchase frequency / month", color="white")
164
+ ax.set_ylabel("Avg spend (€)", color="white")
165
+ ax.tick_params(colors="white")
166
+ ax.grid(True, alpha=0.1, color="white")
167
+
168
+ axes[0].legend(fontsize=7, facecolor="#0d1117", edgecolor="#334155",
169
+ labelcolor="white", loc="upper right", ncol=2)
170
+ plt.tight_layout()
171
+ plt.savefig("output/bias_before_after.png", dpi=150,
172
+ bbox_inches="tight", facecolor="#0d1117")
173
+ plt.close()
174
+
175
+
176
+ def plot_di(before_fair, after_fair):
177
+ fig, ax = plt.subplots(figsize=(8, 4))
178
+ fig.patch.set_facecolor("#0d1117")
179
+ ax.set_facecolor("#0d1117")
180
+
181
+ cats = ["Urban → High", "Rural → High", "Disparate Impact"]
182
+ before_vals = [before_fair["urban_high_pct"], before_fair["rural_high_pct"],
183
+ before_fair["disparate_impact"] * 100]
184
+ after_vals = [after_fair["urban_high_pct"], after_fair["rural_high_pct"],
185
+ after_fair["disparate_impact"] * 100]
186
+
187
+ x = range(len(cats))
188
+ w = 0.35
189
+ ax.bar([i - w/2 for i in x], before_vals, w, label="Before", color="#ef4444", alpha=0.85)
190
+ ax.bar([i + w/2 for i in x], after_vals, w, label="After", color="#10b981", alpha=0.85)
191
+ ax.axhline(80, color="#fbbf24", linewidth=1.5, linestyle="--", label="DI threshold (80%)")
192
+ ax.set_xticks(x)
193
+ ax.set_xticklabels(cats, color="white")
194
+ ax.set_ylabel("Percentage", color="white")
195
+ ax.set_title("Fairness metrics before vs after mitigation", color="white", fontsize=12)
196
+ ax.tick_params(colors="white")
197
+ ax.legend(fontsize=9, facecolor="#0d1117", edgecolor="#334155", labelcolor="white")
198
+ ax.grid(True, axis="y", alpha=0.15, color="white")
199
+ plt.tight_layout()
200
+ plt.savefig("output/disparate_impact.png", dpi=150,
201
+ bbox_inches="tight", facecolor="#0d1117")
202
+ plt.close()
203
+
204
+
205
+ # ── 6. Main ─────────────────────────────────────────────────
206
+ def main():
207
+ print("="*70)
208
+ print("EcoCart Customer Segmentation — Bias Detection & Mitigation")
209
+ print("="*70)
210
+
211
+ # Generate and segment (biased)
212
+ df = generate_biased_data()
213
+ df = segment(df)
214
+ before = compute_fairness(df)
215
+ print(f"\nBEFORE mitigation:")
216
+ print(f" Urban -> High Value: {before['urban_high_pct']}%")
217
+ print(f" Rural -> High Value: {before['rural_high_pct']}%")
218
+ print(f" Disparate Impact: {before['disparate_impact']}")
219
+ print(f" Fair (DI >= 0.8)? {before['fair']}")
220
+
221
+ print(f"\n Segment counts:")
222
+ ct = df.groupby(["region", "segment"]).size().unstack(fill_value=0)
223
+ print(ct.to_string(index=True))
224
+
225
+ # Mitigate
226
+ fixed = mitigate(df)
227
+ after = compute_fairness(fixed)
228
+ print(f"\nAFTER mitigation:")
229
+ print(f" Urban -> High Value: {after['urban_high_pct']}%")
230
+ print(f" Rural -> High Value: {after['rural_high_pct']}%")
231
+ print(f" Disparate Impact: {after['disparate_impact']}")
232
+ print(f" Fair (DI >= 0.8)? {after['fair']}")
233
+
234
+ # Plots
235
+ plot_before_after(df, fixed, before, after)
236
+ plot_di(before, after)
237
+ print("\nWrote: bias_before_after.png, disparate_impact.png")
238
+
239
+ if __name__ == "__main__":
240
+ main()
task3_4_routing.py ADDED
@@ -0,0 +1,333 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ EcoCart Route Optimisation Prototype
3
+ Tasks 3 & 4 — BFS, DFS, A*, IDA* on a weighted delivery network
4
+ + Green Routing mode (CO2-weighted edges for sustainability)
5
+
6
+ NCI MSCAI | Fundamentals of AI TABA 2026
7
+
8
+ Run: python3 task3_4_routing.py
9
+ Out: network_map.png, algo_comparison.png, green_vs_fast.png
10
+ """
11
+
12
+ import heapq, math, time, tracemalloc, statistics
13
+ from collections import deque
14
+ import matplotlib.pyplot as plt
15
+ import matplotlib.patches as mpatches
16
+ import networkx as nx
17
+
18
+ # ── 1. Network ──────────────────────────────────────────────
19
+ NODES = {
20
+ # Urban cluster (dense, short edges)
21
+ "U1":(1.0,1.0,"urban"),"U2":(2.0,1.5,"urban"),"U3":(3.0,1.0,"urban"),
22
+ "U4":(1.5,2.5,"urban"),"U5":(2.5,3.0,"urban"),"U6":(3.5,2.0,"urban"),
23
+ "U7":(1.0,3.5,"urban"),"U8":(2.0,4.0,"urban"),"U9":(3.0,4.0,"urban"),
24
+ "U10":(4.0,3.5,"urban"),
25
+ # Rural cluster (sparse, long edges)
26
+ "R1":(6.0,1.0,"rural"),"R2":(8.0,2.0,"rural"),"R3":(10.0,1.5,"rural"),
27
+ "R4":(7.0,4.0,"rural"),"R5":(9.0,4.5,"rural"),"R6":(11.0,3.5,"rural"),
28
+ "R7":(6.5,6.0,"rural"),"R8":(9.0,7.0,"rural"),"R9":(11.0,6.0,"rural"),
29
+ "R10":(8.0,5.5,"rural"),
30
+ }
31
+
32
+ def _dist(a, b):
33
+ return math.hypot(NODES[a][0]-NODES[b][0], NODES[a][1]-NODES[b][1])
34
+
35
+ _PAIRS = [
36
+ ("U1","U2"),("U2","U3"),("U1","U4"),("U2","U4"),("U2","U5"),
37
+ ("U3","U6"),("U4","U5"),("U5","U6"),("U4","U7"),("U5","U8"),
38
+ ("U6","U10"),("U7","U8"),("U8","U9"),("U9","U10"),("U5","U9"),
39
+ ("R1","R2"),("R2","R3"),("R1","R4"),("R2","R4"),("R3","R6"),
40
+ ("R4","R5"),("R5","R6"),("R4","R7"),("R5","R10"),("R7","R10"),
41
+ ("R7","R8"),("R8","R9"),("R6","R9"),("R8","R10"),("R5","R8"),
42
+ ("U3","R1"),("U10","R4"),("U6","R1"),("U9","R7"),
43
+ ]
44
+
45
+ # Road distance ≈ 1.15× straight-line
46
+ EDGES = [(a, b, round(_dist(a,b)*1.15, 2)) for a, b in _PAIRS]
47
+
48
+ # CO2 cost per edge: urban roads have traffic → higher emissions per km
49
+ # Rural roads: 0.12 kg CO2/km; Urban roads: 0.21 kg CO2/km
50
+ def _co2(a, b, km):
51
+ za, zb = NODES[a][2], NODES[b][2]
52
+ rate = 0.28 if za == "urban" and zb == "urban" else 0.18 if za != zb else 0.10
53
+ return round(km * rate, 3)
54
+
55
+ CO2_EDGES = [(a, b, _co2(a, b, w)) for a, b, w in EDGES]
56
+
57
+ ADJ_KM = {n: [] for n in NODES}
58
+ ADJ_CO2 = {n: [] for n in NODES}
59
+ for i, (a, b, w) in enumerate(EDGES):
60
+ ADJ_KM[a].append((b, w))
61
+ ADJ_KM[b].append((a, w))
62
+ co2 = CO2_EDGES[i][2]
63
+ ADJ_CO2[a].append((b, co2))
64
+ ADJ_CO2[b].append((a, co2))
65
+
66
+ # ── 2. Algorithms ───────────────────────────────────────────
67
+ def heuristic(n, goal, scale=1.0):
68
+ return _dist(n, goal) * scale
69
+
70
+ def bfs(start, goal, adj=ADJ_KM):
71
+ expanded = 0
72
+ q = deque([(start, [start])])
73
+ seen = {start}
74
+ while q:
75
+ node, path = q.popleft()
76
+ expanded += 1
77
+ if node == goal:
78
+ cost = sum(_edge_w(path[i], path[i+1], adj) for i in range(len(path)-1))
79
+ return path, round(cost, 2), expanded
80
+ for nb, _ in adj[node]:
81
+ if nb not in seen:
82
+ seen.add(nb)
83
+ q.append((nb, path + [nb]))
84
+ return None, math.inf, expanded
85
+
86
+ def dfs(start, goal, adj=ADJ_KM, depth_limit=50):
87
+ expanded = 0
88
+ stack = [(start, [start])]
89
+ seen = {start}
90
+ while stack:
91
+ node, path = stack.pop()
92
+ expanded += 1
93
+ if node == goal:
94
+ cost = sum(_edge_w(path[i], path[i+1], adj) for i in range(len(path)-1))
95
+ return path, round(cost, 2), expanded
96
+ if len(path) > depth_limit:
97
+ continue
98
+ for nb, _ in adj[node]:
99
+ if nb not in seen:
100
+ seen.add(nb)
101
+ stack.append((nb, path + [nb]))
102
+ return None, math.inf, expanded
103
+
104
+ def astar(start, goal, adj=ADJ_KM, h_scale=1.0):
105
+ expanded, counter = 0, 0
106
+ heap = [(heuristic(start, goal, h_scale), 0.0, counter, start, [start])]
107
+ best = {start: 0.0}
108
+ while heap:
109
+ f, g, _, node, path = heapq.heappop(heap)
110
+ if node == goal:
111
+ return path, round(g, 2), expanded
112
+ if g > best.get(node, math.inf):
113
+ continue
114
+ expanded += 1
115
+ for nb, w in adj[node]:
116
+ ng = g + w
117
+ if ng < best.get(nb, math.inf):
118
+ best[nb] = ng
119
+ counter += 1
120
+ heapq.heappush(heap, (ng + heuristic(nb, goal, h_scale), ng, counter, nb, path + [nb]))
121
+ return None, math.inf, expanded
122
+
123
+ def ida_star(start, goal, adj=ADJ_KM, h_scale=1.0):
124
+ expanded = [0]
125
+ def _dfs(node, g, bound, path, visited):
126
+ f = g + heuristic(node, goal, h_scale)
127
+ if f > bound:
128
+ return None, f
129
+ expanded[0] += 1
130
+ if node == goal:
131
+ return list(path), g
132
+ nxt = math.inf
133
+ for nb, w in adj[node]:
134
+ if nb in visited:
135
+ continue
136
+ visited.add(nb)
137
+ path.append(nb)
138
+ r, t = _dfs(nb, g + w, bound, path, visited)
139
+ if r is not None:
140
+ return r, t
141
+ if t < nxt:
142
+ nxt = t
143
+ path.pop()
144
+ visited.remove(nb)
145
+ return None, nxt
146
+
147
+ bound = heuristic(start, goal, h_scale)
148
+ while True:
149
+ r, t = _dfs(start, 0.0, bound, [start], {start})
150
+ if r is not None:
151
+ return r, round(t, 2), expanded[0]
152
+ if t == math.inf:
153
+ return None, math.inf, expanded[0]
154
+ bound = t
155
+
156
+ def _edge_w(a, b, adj):
157
+ for nb, w in adj[a]:
158
+ if nb == b:
159
+ return w
160
+ return math.inf
161
+
162
+ # ── 3. Benchmark ────────────────────────────────────────────
163
+ def benchmark(algo, start, goal, adj=ADJ_KM, repeats=20):
164
+ times, mems = [], []
165
+ path = cost = expanded = None
166
+ for _ in range(repeats):
167
+ tracemalloc.start()
168
+ t0 = time.perf_counter()
169
+ path, cost, expanded = algo(start, goal, adj)
170
+ t1 = time.perf_counter()
171
+ _, peak = tracemalloc.get_traced_memory()
172
+ tracemalloc.stop()
173
+ times.append((t1 - t0) * 1000)
174
+ mems.append(peak / 1024)
175
+ return {
176
+ "ms": round(statistics.mean(times), 3),
177
+ "kb": round(statistics.mean(mems), 2),
178
+ "expanded": expanded,
179
+ "cost": cost,
180
+ "path": path,
181
+ }
182
+
183
+ OD_URBAN = [("U1","U10"),("U7","U6"),("U2","U9"),("U1","U9"),("U3","U8")]
184
+ OD_RURAL = [("R1","R9"),("R2","R8"),("R3","R10"),("R1","R6"),("R4","R9")]
185
+
186
+ # ── 4. Plots ────────────────────────────────────────────────
187
+ def plot_network():
188
+ G = nx.Graph()
189
+ for n, (x, y, _) in NODES.items():
190
+ G.add_node(n, pos=(x, y))
191
+ for a, b, w in EDGES:
192
+ G.add_edge(a, b, weight=w)
193
+ pos = {n: (NODES[n][0], NODES[n][1]) for n in NODES}
194
+ colors = ["#ef4444" if NODES[n][2] == "urban" else "#10b981" for n in NODES]
195
+
196
+ fig, ax = plt.subplots(figsize=(13, 6))
197
+ ax.set_facecolor("#0d1117")
198
+ fig.patch.set_facecolor("#0d1117")
199
+ nx.draw(G, pos, ax=ax, with_labels=True, node_color=colors, node_size=500,
200
+ font_size=8, font_weight="bold", font_color="white",
201
+ edge_color="#334155", width=1.2)
202
+ labels = {(a, b): f"{w}" for a, b, w in EDGES}
203
+ nx.draw_networkx_edge_labels(G, pos, ax=ax, edge_labels=labels,
204
+ font_size=6, font_color="#94a3b8")
205
+ urban_patch = mpatches.Patch(color="#ef4444", label="Urban node")
206
+ rural_patch = mpatches.Patch(color="#10b981", label="Rural node")
207
+ ax.legend(handles=[urban_patch, rural_patch], loc="upper left",
208
+ fontsize=9, facecolor="#0d1117", edgecolor="#334155", labelcolor="white")
209
+ ax.set_title("EcoCart 20-node delivery network (edge labels = km)",
210
+ color="white", fontsize=12, pad=12)
211
+ plt.tight_layout()
212
+ plt.savefig("output/network_map.png", dpi=150, bbox_inches="tight",
213
+ facecolor="#0d1117")
214
+ plt.close()
215
+
216
+
217
+ def plot_comparison(results):
218
+ metrics = [("Runtime (ms)", "ms"), ("Nodes expanded", "expanded"), ("Peak memory (KB)", "kb")]
219
+ fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
220
+ fig.patch.set_facecolor("#0d1117")
221
+ for ax, (title, key) in zip(axes, metrics):
222
+ ax.set_facecolor("#0d1117")
223
+ u_a = statistics.mean(r["astar"][key] for r in results["urban"])
224
+ u_i = statistics.mean(r["ida"][key] for r in results["urban"])
225
+ r_a = statistics.mean(r["astar"][key] for r in results["rural"])
226
+ r_i = statistics.mean(r["ida"][key] for r in results["rural"])
227
+ x = [0, 1]
228
+ w = 0.32
229
+ ax.bar([xi - w/2 for xi in x], [u_a, r_a], w, label="A*", color="#3b82f6")
230
+ ax.bar([xi + w/2 for xi in x], [u_i, r_i], w, label="IDA*", color="#8b5cf6")
231
+ ax.set_xticks(x)
232
+ ax.set_xticklabels(["Urban", "Rural"], color="white")
233
+ ax.set_title(title, color="white", fontsize=11)
234
+ ax.tick_params(colors="white")
235
+ ax.grid(True, axis="y", alpha=0.15, color="white")
236
+ ax.legend(fontsize=9, facecolor="#0d1117", edgecolor="#334155", labelcolor="white")
237
+ plt.suptitle("A* vs IDA* (mean over 5 O-D pairs × 20 runs)",
238
+ color="white", fontsize=12)
239
+ plt.tight_layout()
240
+ plt.savefig("output/algo_comparison.png", dpi=150,
241
+ bbox_inches="tight", facecolor="#0d1117")
242
+ plt.close()
243
+
244
+
245
+ def plot_green_vs_fast():
246
+ """Compare fastest route (A* on km) vs greenest route (A* on CO2)."""
247
+ pairs = [("U1", "R9"), ("U7", "R6"), ("R1", "U10")]
248
+ fig, axes = plt.subplots(1, 3, figsize=(15, 5))
249
+ fig.patch.set_facecolor("#0d1117")
250
+
251
+ G = nx.Graph()
252
+ for n, (x, y, _) in NODES.items():
253
+ G.add_node(n, pos=(x, y))
254
+ for a, b, w in EDGES:
255
+ G.add_edge(a, b)
256
+ pos = {n: (NODES[n][0], NODES[n][1]) for n in NODES}
257
+
258
+ for ax, (s, g) in zip(axes, pairs):
259
+ ax.set_facecolor("#0d1117")
260
+ fast_path, fast_km, _ = astar(s, g, ADJ_KM)
261
+ green_path, green_co2, _ = astar(s, g, ADJ_CO2, h_scale=0.10)
262
+
263
+ # Compute cross-metrics
264
+ fast_co2 = sum(_edge_w(fast_path[i], fast_path[i+1], ADJ_CO2) for i in range(len(fast_path)-1))
265
+ green_km = sum(_edge_w(green_path[i], green_path[i+1], ADJ_KM) for i in range(len(green_path)-1))
266
+
267
+ colors = ["#ef4444" if NODES[n][2] == "urban" else "#10b981" for n in NODES]
268
+ nx.draw(G, pos, ax=ax, with_labels=True, node_color=colors,
269
+ node_size=300, font_size=7, font_weight="bold",
270
+ font_color="white", edge_color="#1e293b", width=0.8)
271
+
272
+ fast_edges = [(fast_path[i], fast_path[i+1]) for i in range(len(fast_path)-1)]
273
+ green_edges = [(green_path[i], green_path[i+1]) for i in range(len(green_path)-1)]
274
+ nx.draw_networkx_edges(G, pos, ax=ax, edgelist=fast_edges,
275
+ edge_color="#f59e0b", width=3, alpha=0.8)
276
+ nx.draw_networkx_edges(G, pos, ax=ax, edgelist=green_edges,
277
+ edge_color="#22c55e", width=3, style="dashed", alpha=0.8)
278
+ ax.set_title(f"{s} → {g}\nFast: {fast_km:.1f}km / {fast_co2:.2f}kg CO₂\n"
279
+ f"Green: {green_km:.1f}km / {green_co2:.2f}kg CO₂",
280
+ color="white", fontsize=9, linespacing=1.4)
281
+ fast_patch = mpatches.Patch(color="#f59e0b", label="Fastest (min km)")
282
+ green_patch = mpatches.Patch(color="#22c55e", label="Greenest (min CO₂)")
283
+ fig.legend(handles=[fast_patch, green_patch], loc="lower center",
284
+ ncol=2, fontsize=10, facecolor="#0d1117", edgecolor="#334155",
285
+ labelcolor="white")
286
+ plt.suptitle("Fast Route vs Green Route — same A*, different cost function",
287
+ color="white", fontsize=12)
288
+ plt.tight_layout(rect=[0, 0.06, 1, 0.95])
289
+ plt.savefig("output/green_vs_fast.png", dpi=150,
290
+ bbox_inches="tight", facecolor="#0d1117")
291
+ plt.close()
292
+
293
+
294
+ # ── 5. Main ─────────────────────────────────────────────────
295
+ def main():
296
+ print("="*70)
297
+ print("EcoCart Route Optimisation — A* vs IDA* benchmark")
298
+ print("="*70)
299
+
300
+ # Smoke test all four
301
+ for name, fn in [("BFS", bfs), ("DFS", dfs), ("A*", astar), ("IDA*", ida_star)]:
302
+ path, cost, exp = fn("U1", "U10")
303
+ print(f" {name:5s} U1->U10 cost={cost:.2f} km expanded={exp}")
304
+
305
+ # Full benchmark A* vs IDA*
306
+ results = {"urban": [], "rural": []}
307
+ for label, pairs in [("urban", OD_URBAN), ("rural", OD_RURAL)]:
308
+ print(f"\n--- {label.upper()} benchmark ---")
309
+ for s, g in pairs:
310
+ a = benchmark(astar, s, g)
311
+ i = benchmark(ida_star, s, g)
312
+ results[label].append({"pair": (s, g), "astar": a, "ida": i})
313
+ print(f" {s}->{g}: A* {a['cost']:.2f}km/{a['expanded']}exp/{a['ms']:.3f}ms "
314
+ f"IDA* {i['cost']:.2f}km/{i['expanded']}exp/{i['ms']:.3f}ms")
315
+ assert abs(a["cost"] - i["cost"]) < 1e-4, "Optimality mismatch"
316
+
317
+ # Green routing demo
318
+ print("\n--- GREEN ROUTING ---")
319
+ for s, g in [("U1","R9"), ("U7","R6")]:
320
+ fp, fk, _ = astar(s, g, ADJ_KM)
321
+ gp, gc, _ = astar(s, g, ADJ_CO2, h_scale=0.10)
322
+ fco2 = sum(_edge_w(fp[i], fp[i+1], ADJ_CO2) for i in range(len(fp)-1))
323
+ gkm = sum(_edge_w(gp[i], gp[i+1], ADJ_KM) for i in range(len(gp)-1))
324
+ print(f" {s}->{g} Fast: {fk:.1f}km/{fco2:.2f}kgCO2 Green: {gkm:.1f}km/{gc:.2f}kgCO2")
325
+
326
+ # Generate plots
327
+ plot_network()
328
+ plot_comparison(results)
329
+ plot_green_vs_fast()
330
+ print("\nWrote: network_map.png, algo_comparison.png, green_vs_fast.png")
331
+
332
+ if __name__ == "__main__":
333
+ main()
task5_forecasting.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ EcoCart Demand Forecasting Prototype
3
+ Task 5 — Linear Regression vs Random Forest on synthetic daily sales.
4
+
5
+ NCI MSCAI | Fundamentals of AI TABA 2026
6
+
7
+ Run: python3 task5_forecasting.py
8
+ Out: forecast.png, residuals.png, feature_importance.png
9
+ """
10
+
11
+ import numpy as np
12
+ import pandas as pd
13
+ import matplotlib.pyplot as plt
14
+ from sklearn.linear_model import LinearRegression
15
+ from sklearn.ensemble import RandomForestRegressor
16
+ from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
17
+
18
+ RNG = np.random.default_rng(42)
19
+
20
+
21
+ # ── 1. Synthetic sales data ────────────────────────────────
22
+ def generate_sales(days=730):
23
+ t = np.arange(days)
24
+ dates = pd.date_range("2023-01-01", periods=days, freq="D")
25
+ base = 100 + 0.05 * t
26
+ weekly = 25 * np.sin(2 * np.pi * t / 7)
27
+ yearly = 40 * np.sin(2 * np.pi * t / 365)
28
+ noise = RNG.normal(0, 8, days)
29
+ promo = np.zeros(days)
30
+ promo[RNG.choice(days, int(days * 0.06), replace=False)] = RNG.uniform(30, 70, int(days * 0.06))
31
+ sales = np.clip(base + weekly + yearly + noise + promo, 0, None)
32
+
33
+ return pd.DataFrame({
34
+ "date": dates, "sales": sales,
35
+ "dow": dates.dayofweek, "month": dates.month,
36
+ "day_of_year": dates.dayofyear,
37
+ "is_promo": (promo > 0).astype(int),
38
+ })
39
+
40
+
41
+ # ── 2. Features ────────────────────────────────────────────
42
+ def add_features(df):
43
+ out = df.copy()
44
+ for lag in [1, 7, 14]:
45
+ out[f"lag_{lag}"] = out["sales"].shift(lag)
46
+ out["roll_7"] = out["sales"].shift(1).rolling(7).mean()
47
+ out["roll_30"] = out["sales"].shift(1).rolling(30).mean()
48
+ return out.dropna().reset_index(drop=True)
49
+
50
+
51
+ FEATURES = ["dow", "month", "day_of_year", "is_promo",
52
+ "lag_1", "lag_7", "lag_14", "roll_7", "roll_30"]
53
+
54
+
55
+ # ── 3. Train & evaluate ───────────────────────────────────
56
+ def evaluate(name, y_true, y_pred):
57
+ mae = mean_absolute_error(y_true, y_pred)
58
+ rmse = mean_squared_error(y_true, y_pred) ** 0.5
59
+ r2 = r2_score(y_true, y_pred)
60
+ mape = np.mean(np.abs((y_true - y_pred) / np.where(y_true == 0, 1, y_true))) * 100
61
+ print(f" {name:<22s} MAE={mae:6.2f} RMSE={rmse:6.2f} R²={r2:.3f} MAPE={mape:.2f}%")
62
+ return {"mae": mae, "rmse": rmse, "r2": r2, "mape": mape}
63
+
64
+
65
+ def main():
66
+ print("="*70)
67
+ print("EcoCart Demand Forecasting — LR vs Random Forest")
68
+ print("="*70)
69
+
70
+ df = generate_sales()
71
+ df = add_features(df)
72
+ split = int(len(df) * 0.8)
73
+ train, test = df.iloc[:split], df.iloc[split:]
74
+ X_tr, y_tr = train[FEATURES], train["sales"]
75
+ X_te, y_te = test[FEATURES], test["sales"]
76
+ print(f"Train: {len(train)} days Test: {len(test)} days")
77
+
78
+ lr = LinearRegression().fit(X_tr, y_tr)
79
+ rf = RandomForestRegressor(n_estimators=200, max_depth=12,
80
+ min_samples_leaf=3, random_state=42,
81
+ n_jobs=-1).fit(X_tr, y_tr)
82
+ lr_pred = lr.predict(X_te)
83
+ rf_pred = rf.predict(X_te)
84
+
85
+ print("\nTest-set metrics:")
86
+ lr_m = evaluate("Linear Regression", y_te.values, lr_pred)
87
+ rf_m = evaluate("Random Forest", y_te.values, rf_pred)
88
+
89
+ # ── Plots ──
90
+ plt.rcParams.update({"axes.facecolor":"#0d1117","figure.facecolor":"#0d1117",
91
+ "text.color":"white","axes.labelcolor":"white",
92
+ "xtick.color":"white","ytick.color":"white"})
93
+
94
+ # Forecast
95
+ fig, ax = plt.subplots(figsize=(13, 5))
96
+ ax.plot(test.date, y_te, color="#e2e8f0", lw=1.3, label="Actual")
97
+ ax.plot(test.date, lr_pred, color="#3b82f6", lw=1, alpha=0.8, label="Linear Regression")
98
+ ax.plot(test.date, rf_pred, color="#10b981", lw=1, alpha=0.8, label="Random Forest")
99
+ ax.set_title("Test-set: actual vs predicted daily demand", fontsize=12)
100
+ ax.set_xlabel("Date"); ax.set_ylabel("Units sold")
101
+ ax.legend(fontsize=9, facecolor="#0d1117", edgecolor="#334155", labelcolor="white")
102
+ ax.grid(True, alpha=0.1)
103
+ plt.tight_layout()
104
+ plt.savefig("output/forecast.png", dpi=150, bbox_inches="tight")
105
+ plt.close()
106
+
107
+ # Residuals
108
+ fig, axes = plt.subplots(1, 2, figsize=(13, 4.5))
109
+ for ax, pred, name, color, m in [
110
+ (axes[0], lr_pred, "Linear Regression", "#3b82f6", lr_m),
111
+ (axes[1], rf_pred, "Random Forest", "#10b981", rf_m),
112
+ ]:
113
+ ax.scatter(pred, y_te.values - pred, s=12, c=color, alpha=0.6)
114
+ ax.axhline(0, color="white", lw=0.8)
115
+ ax.set_title(f"{name} residuals (RMSE={m['rmse']:.2f})", fontsize=11)
116
+ ax.set_xlabel("Predicted"); ax.set_ylabel("Residual")
117
+ ax.grid(True, alpha=0.1)
118
+ plt.tight_layout()
119
+ plt.savefig("output/residuals.png", dpi=150, bbox_inches="tight")
120
+ plt.close()
121
+
122
+ # Feature importance
123
+ imp = pd.Series(rf.feature_importances_, index=FEATURES).sort_values()
124
+ fig, ax = plt.subplots(figsize=(8, 4.5))
125
+ ax.barh(imp.index, imp.values, color="#10b981")
126
+ ax.set_title("Random Forest — feature importance", fontsize=12)
127
+ ax.set_xlabel("Importance")
128
+ ax.grid(True, axis="x", alpha=0.1)
129
+ plt.tight_layout()
130
+ plt.savefig("output/feature_importance.png", dpi=150, bbox_inches="tight")
131
+ plt.close()
132
+
133
+ print(f"\nTop features: {', '.join(imp.index[-3:][::-1])}")
134
+ print("Wrote: forecast.png, residuals.png, feature_importance.png")
135
+
136
+ if __name__ == "__main__":
137
+ main()