muthuk1 commited on
Commit
3051420
Β·
verified Β·
1 Parent(s): 7b1fee3

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +267 -0
README.md ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ALWAS ML Models β€” Analog Layout Workflow Automation System
2
+
3
+ > **4 production-ready ML models** for the ALWAS system. Replaces the Groq LLM API dependency with faster, free, local inference.
4
+
5
+ ## 🎯 Models
6
+
7
+ | Model | Task | Metric | Value |
8
+ |-------|------|--------|-------|
9
+ | **Hours Estimator** | Predict layout hours from block metadata | RΒ² / MAE | 0.881 / 5.78h |
10
+ | **Complexity Classifier** | Classify Low/Medium/High complexity | Accuracy / F1 | 91.7% / 0.917 |
11
+ | **Bottleneck Predictor** | Detect blocks at risk of getting stuck | Accuracy / F1 | 99.6% / 0.996 |
12
+ | **Completion Predictor** | Predict remaining hours to completion | RΒ² / MAE | 0.945 / 1.65h |
13
+
14
+ ## πŸ—οΈ Architecture
15
+
16
+ ```
17
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
18
+ β”‚ ALWAS ML Pipeline β”‚
19
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
20
+ β”‚ β”‚
21
+ β”‚ Block Created ──► Hours Estimator (XGBoost) ──► Est. Hours β”‚
22
+ β”‚ ──► Complexity Classifier (XGB+LGB) ──► Class β”‚
23
+ β”‚ β”‚
24
+ β”‚ Block In-Progress ──► Bottleneck Predictor ──► Risk Alert β”‚
25
+ β”‚ ──► Completion Predictor ──► ETA β”‚
26
+ β”‚ β”‚
27
+ β”‚ Hourly Cron ──► Batch Bottleneck Scan ──► Notifications β”‚
28
+ β”‚ β”‚
29
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
30
+ ```
31
+
32
+ ## πŸš€ Quick Start
33
+
34
+ ### Python (Direct)
35
+
36
+ ```python
37
+ import joblib
38
+ import numpy as np
39
+
40
+ # Load models
41
+ hours_model = joblib.load('models/hours_estimator.joblib')
42
+ complexity_xgb = joblib.load('models/complexity_xgb.joblib')
43
+ complexity_lgb = joblib.load('models/complexity_lgb.joblib')
44
+ bottleneck_model = joblib.load('models/bottleneck_predictor.joblib')
45
+ completion_model = joblib.load('models/completion_predictor.joblib')
46
+
47
+ # Load encoders
48
+ tech_node_encoder = joblib.load('models/tech_node_encoder.joblib')
49
+ block_type_encoder = joblib.load('models/block_type_encoder.joblib')
50
+ ```
51
+
52
+ ### REST API
53
+
54
+ ```bash
55
+ # Install
56
+ pip install fastapi uvicorn joblib xgboost lightgbm scikit-learn numpy
57
+
58
+ # Run
59
+ MODEL_DIR=./models python inference_server.py
60
+
61
+ # Call
62
+ curl -X POST http://localhost:7860/predict/estimate \
63
+ -H "Content-Type: application/json" \
64
+ -d '{
65
+ "block_type": "PLL",
66
+ "tech_node": "7nm",
67
+ "priority": "P1-Critical",
68
+ "transistor_count": 80000,
69
+ "has_dependencies": true,
70
+ "num_dependencies": 3,
71
+ "constraint_complexity": 2.5,
72
+ "drc_iterations": 4
73
+ }'
74
+ ```
75
+
76
+ Response:
77
+ ```json
78
+ {
79
+ "complexity": "High",
80
+ "estimated_hours": 89.0,
81
+ "confidence": 0.996,
82
+ "risk_level": "high",
83
+ "reasoning": "Advanced 7nm node requires extensive DRC/LVS iterations...",
84
+ "recommended_drc_iterations": 4,
85
+ "suggested_engineer_skill_level": "senior",
86
+ "complexity_probabilities": {"High": 0.996, "Low": 0.0, "Medium": 0.003},
87
+ "estimated_days": 11.1
88
+ }
89
+ ```
90
+
91
+ ## πŸ“‘ API Endpoints
92
+
93
+ | Method | Endpoint | Description |
94
+ |--------|----------|-------------|
95
+ | `POST` | `/predict/estimate` | Complexity & hours estimation (replaces Groq) |
96
+ | `POST` | `/predict/bottleneck` | Bottleneck risk prediction |
97
+ | `POST` | `/predict/completion` | Completion time prediction |
98
+ | `POST` | `/predict/bulk-estimate` | Bulk estimation (up to 200 blocks) |
99
+ | `GET` | `/model/metrics` | Model performance metrics |
100
+ | `GET` | `/model/supported-values` | Supported block types, tech nodes, etc. |
101
+ | `GET` | `/health` | Health check |
102
+
103
+ ## πŸ”Œ ALWAS Integration
104
+
105
+ ### Replace Groq API in Express.js
106
+
107
+ **Before** (server/routes/blocks.js):
108
+ ```javascript
109
+ // Old: Groq LLM call ($0.002/request, 300ms latency)
110
+ const response = await groq.chat.completions.create({
111
+ model: "llama-3.3-70b-versatile",
112
+ messages: [{ role: "user", content: prompt }]
113
+ });
114
+ ```
115
+
116
+ **After** (using ALWAS ML API):
117
+ ```javascript
118
+ // New: Local ML model (free, <5ms latency)
119
+ const response = await fetch('http://localhost:7860/predict/estimate', {
120
+ method: 'POST',
121
+ headers: { 'Content-Type': 'application/json' },
122
+ body: JSON.stringify({
123
+ block_type: block.type,
124
+ tech_node: block.techNode,
125
+ priority: block.priority,
126
+ transistor_count: block.transistorCount,
127
+ has_dependencies: block.dependencies?.length > 0,
128
+ num_dependencies: block.dependencies?.length || 0,
129
+ constraint_complexity: block.constraintComplexity || 1.0,
130
+ drc_iterations: block.drcIterations || 2
131
+ })
132
+ });
133
+ const estimate = await response.json();
134
+ ```
135
+
136
+ ### Add Bottleneck Scanning to Cron Job
137
+
138
+ ```javascript
139
+ // In server/cron/bottleneckScanner.js
140
+ const blocks = await Block.find({ status: { $ne: 'Completed' } });
141
+
142
+ for (const block of blocks) {
143
+ const risk = await fetch('http://localhost:7860/predict/bottleneck', {
144
+ method: 'POST',
145
+ headers: { 'Content-Type': 'application/json' },
146
+ body: JSON.stringify({
147
+ block_type: block.type,
148
+ tech_node: block.techNode,
149
+ estimated_hours: block.estimatedHours,
150
+ hours_logged: block.hoursLogged,
151
+ current_stage: block.status,
152
+ days_in_current_stage: daysSinceLastTransition(block),
153
+ drc_violations_total: block.drcViolations,
154
+ is_overdue: new Date() > block.dueDate
155
+ })
156
+ });
157
+ const result = await risk.json();
158
+
159
+ if (result.should_alert) {
160
+ // Create notification for manager
161
+ await Notification.create({
162
+ type: 'stuck',
163
+ message: `ML Alert: ${block.name} has HIGH bottleneck risk`,
164
+ recommendations: result.recommendations
165
+ });
166
+ io.emit('newNotification', { blockId: block._id, risk: result });
167
+ }
168
+ }
169
+ ```
170
+
171
+ ### Add Completion ETA to Block Detail
172
+
173
+ ```javascript
174
+ // In GET /api/blocks/:id
175
+ const completion = await fetch('http://localhost:7860/predict/completion', {
176
+ method: 'POST',
177
+ headers: { 'Content-Type': 'application/json' },
178
+ body: JSON.stringify({
179
+ block_type: block.type,
180
+ tech_node: block.techNode,
181
+ estimated_hours: block.estimatedHours,
182
+ current_stage: block.status,
183
+ cumulative_hours: block.hoursLogged,
184
+ cumulative_days: daysSinceStart(block),
185
+ cumulative_drc_violations: block.drcViolations
186
+ })
187
+ });
188
+ const eta = await completion.json();
189
+ // eta.remaining_hours, eta.estimated_completion_date, eta.progress_percent
190
+ ```
191
+
192
+ ## πŸ“Š Supported Values
193
+
194
+ ### Block Types (20)
195
+ ADC, BGR, BandgapRef, Comparator, CurrentMirror, DAC, DiffAmp, LDO, LNA, LVDS_Driver, Mixer, OTA, Oscillator, PA, PLL, PowerDetector, SampleHold, SerDes, TIA, VCO
196
+
197
+ ### Technology Nodes (8)
198
+ 5nm, 7nm, 12nm, 14nm, 22nm, 28nm, 45nm, 65nm
199
+
200
+ ### Pipeline Stages (7)
201
+ Not Started β†’ In Progress β†’ DRC β†’ LVS β†’ ERC β†’ Review β†’ Completed
202
+
203
+ ## πŸ“ˆ Feature Importance
204
+
205
+ ### Hours Estimation β€” Top Features
206
+ 1. `transistor_count_log` (31.5%) β€” Most predictive: larger blocks take longer
207
+ 2. `transistor_count` (28.6%) β€” Raw count captures non-log relationships
208
+ 3. `engineer_skill_factor` (7.7%) β€” Skill level matters significantly
209
+ 4. `tech_node_encoded` (6.8%) β€” Advanced nodes are harder
210
+ 5. `constraint_complexity` (2.7%) β€” Analog constraints add overhead
211
+
212
+ ### Completion Prediction β€” Top Features
213
+ 1. `current_stage_idx` (44.9%) β€” Current stage is the strongest signal
214
+ 2. `stages_completed` (22.3%) β€” Progress through pipeline
215
+ 3. `avg_hours_per_stage_so_far` (21.0%) β€” Pace of work predicts future
216
+
217
+ ## πŸ”§ Retraining
218
+
219
+ ```bash
220
+ # Generate new training data from ALWAS MongoDB exports
221
+ python training/generate_dataset.py
222
+
223
+ # Train all models
224
+ python training/train_models.py
225
+ python training/train_completion.py
226
+ ```
227
+
228
+ **Recommended retraining schedule:** Monthly, or when >100 new completed blocks accumulate.
229
+
230
+ ## πŸ“¦ Files
231
+
232
+ ```
233
+ models/
234
+ hours_estimator.joblib # XGBoost regressor
235
+ complexity_xgb.joblib # XGBoost classifier (ensemble member)
236
+ complexity_lgb.joblib # LightGBM classifier (ensemble member)
237
+ bottleneck_predictor.joblib # Calibrated XGBoost classifier
238
+ completion_predictor.joblib # XGBoost regressor for remaining time
239
+ tech_node_encoder.joblib # LabelEncoder
240
+ block_type_encoder.joblib # LabelEncoder
241
+ priority_encoder.joblib # OrdinalEncoder
242
+ complexity_encoder.joblib # LabelEncoder
243
+ bottleneck_encoder.joblib # LabelEncoder
244
+ feature_config.json # Feature lists and supported values
245
+ metrics.json # Model evaluation metrics
246
+ inference_server.py # FastAPI inference server
247
+ training/
248
+ generate_dataset.py # Synthetic data generator
249
+ train_models.py # Model training (Models 1-3)
250
+ train_completion.py # Completion model training (Model 4)
251
+ ```
252
+
253
+ ## πŸ“ Performance vs Groq API
254
+
255
+ | Metric | Groq llama-3.3-70b | ALWAS ML Models |
256
+ |--------|---------------------|-----------------|
257
+ | Latency | ~300ms | <5ms |
258
+ | Cost per request | $0.002 | Free |
259
+ | Internet required | Yes | No |
260
+ | Structured output | Sometimes | Always (JSON guaranteed) |
261
+ | Batch support | Limited | 200 blocks/call |
262
+ | Bottleneck detection | No | Yes (real-time) |
263
+ | Completion prediction | No | Yes (RΒ²=0.945) |
264
+ | Explainability | LLM narrative | Feature importance + reasoning |
265
+
266
+ ## License
267
+ MIT β€” Built for EPIC Build-A-Thon 2026 | Epical Layouts Pvt. Ltd.