JaydeepR Claude Sonnet 4.6 commited on
Commit
5275508
·
1 Parent(s): 1b26bd8

Remove internal planning docs from repo, gitignore them

Browse files

Keep README.md and ARCHITECTURE.md (public-facing).
Remove from tracking: idea.md, IMPLEMENTATION_PLAN.md,
presentation_creation.md, submission_requirements.md,
theme.md, understanding.md, specs/ — files remain locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

.gitignore CHANGED
@@ -39,3 +39,12 @@ Thumbs.db
39
  # Generated presentations (keep locally, don't track in git)
40
  deck/*.pptx
41
  deck/*.pdf
 
 
 
 
 
 
 
 
 
 
39
  # Generated presentations (keep locally, don't track in git)
40
  deck/*.pptx
41
  deck/*.pdf
42
+
43
+ # Internal planning / session docs — not for the repo
44
+ idea.md
45
+ IMPLEMENTATION_PLAN.md
46
+ presentation_creation.md
47
+ submission_requirements.md
48
+ theme.md
49
+ understanding.md
50
+ specs/
IMPLEMENTATION_PLAN.md DELETED
@@ -1,700 +0,0 @@
1
- # TenderIQ — Implementation Plan
2
-
3
- > **For:** any contributor or fresh AI context picking up this project.
4
- > **You do not need any prior conversation context to use this document.**
5
-
6
- ---
7
-
8
- ## 0. How To Use This Plan
9
-
10
- This project follows **spec-driven development**:
11
-
12
- 1. **This document** is the master implementation plan. It defines architecture, modules, schemas, and the build order. It does **not** contain final source code.
13
- 2. For **each module or coherent unit of work** listed in this plan, the team will produce a **spec document** (a short markdown file) before writing code. Each spec covers: inputs, outputs, function signatures, error cases, dependencies, and acceptance criteria.
14
- 3. Code is written **only against an approved spec**, not directly from this plan.
15
- 4. Specs live in `specs/` (e.g. `specs/01_llm_client.md`, `specs/02_ocr_pipeline.md`). One spec per module. Number prefixes follow the build order in section 9.
16
- 5. Once a spec is implemented, the spec file is preserved alongside the code as documentation.
17
-
18
- **Sequencing rule:** never skip the spec step. If you find yourself wanting to "just code it," stop and write the spec first — it forces precision and exposes hidden assumptions.
19
-
20
- ---
21
-
22
- ## 1. Background
23
-
24
- ### What TenderIQ is
25
- TenderIQ is an AI-powered platform that automates eligibility evaluation of bidders against government tender criteria. It is being built for the **Central Reserve Police Force (CRPF) hackathon, Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement**.
26
-
27
- ### Why it exists
28
- Government procurement officers today manually read tender documents (criteria, thresholds, compliance requirements) and bidder submissions (financial statements, certifications, project records — often in mixed formats including scans and photos), and decide whether each bidder meets each criterion. For one tender, a committee may spend days; two evaluators routinely reach different conclusions on the same documents; there is no consistent audit trail.
29
-
30
- TenderIQ does this evaluation automatically while preserving human oversight: extract criteria from the tender, parse bidder documents, evaluate criterion-by-criterion with confidence scoring, surface ambiguous cases for human review, and emit a complete audit log.
31
-
32
- ### Where this project sits in the hackathon
33
- - **Round 1 (Idea Phase)**: written submission — already shortlisted. See `idea.md`.
34
- - **Round 2 (Prototype Phase)**: working prototype — this is what we are building. Submission requirements are in `submission_requirements.md`.
35
-
36
- ### Source documents in this repository
37
- | File | Purpose |
38
- |---|---|
39
- | `theme.md` | Original problem statement from CRPF (the "why" and the hard constraints) |
40
- | `idea.md` | The shortlisted Round 1 written submission (the "what") |
41
- | `understanding.md` | Synthesized understanding of the problem space |
42
- | `submission_requirements.md` | Form fields required for the Round 2 submission |
43
- | `IMPLEMENTATION_PLAN.md` | **This file** — the build plan |
44
- | `specs/` | Per-module spec documents (created during build, one per module) |
45
-
46
- Read those four documents (theme, idea, understanding, submission requirements) before drafting the first spec.
47
-
48
- ---
49
-
50
- ## 2. Hard Constraints (from the theme — non-negotiable)
51
-
52
- These are evaluator-facing requirements. Every architectural decision must respect them.
53
-
54
- 1. **Every verdict must be explainable at criterion level** — for each (bidder, criterion) pair the system must show: which criterion was checked, which document and page provided the evidence, what value was extracted, what confidence the system had, and why the verdict was assigned.
55
- 2. **Never silently disqualify** — low-confidence or ambiguous cases must be routed to a human review queue with a stated reason, never auto-rejected.
56
- 3. **Must handle scanned documents and photographs** — OCR is mandatory. The system cannot assume digital text.
57
- 4. **End-to-end auditable** — every action (criterion extraction, evaluation, OCR fallback invocation, human review action) must be logged with timestamp, model version, actor, and payload.
58
-
59
- A submission that fails any of these is unlikely to score well. Treat them as acceptance criteria for the system as a whole.
60
-
61
- ---
62
-
63
- ## 3. Operating Constraints (this build)
64
-
65
- - **Time budget:** ~6 hours total — ~5h build + ~1.5h deck/video/screenshots/submission. Do not exceed scope. Compression strategy is documented in section 11.
66
- - **Platform:** Windows 11 development machine. Streamlit Cloud for hosted demo.
67
- - **Language:** Python 3.10+.
68
- - **Starting point:** the project is empty except for the source documents listed in section 1. Everything below is to be created.
69
- - **API access:** the developer has a **DeepSeek API key**. No other LLM/vision API keys are assumed available.
70
- - **Storage:** file-based only. SQLite for the audit log; ChromaDB persistent client for vectors. No external services beyond the DeepSeek API and Streamlit Cloud.
71
- - **Auth/multi-user:** out of scope. A single hardcoded "officer" identity is used in audit entries.
72
-
73
- ---
74
-
75
- ## 4. Confirmed Architectural Decisions
76
-
77
- These were the result of explicit trade-off discussions before the plan was written. Do not relitigate without strong reason.
78
-
79
- ### 4.1 UI / Backend
80
- **Single Streamlit app** (`streamlit==1.39.0`). No separate frontend, no FastAPI service. Streamlit handles UI and orchestration. Deployable free to Streamlit Community Cloud, which satisfies the "Demo Link" submission requirement.
81
-
82
- ### 4.2 LLM
83
- **DeepSeek API**, model `deepseek-v4-pro`, called via the **OpenAI Python SDK** with `base_url="https://api.deepseek.com/v1"` (DeepSeek is OpenAI-compatible). DeepSeek V4-Pro is multimodal — it accepts image inputs, which we exploit for vision-OCR (section 4.4).
84
-
85
- ### 4.3 Live-first LLM with cached fallback
86
- The app **always attempts a live LLM call first**. On any `LLMUnavailable` exception (rate limit, network error, malformed JSON after retries, missing key), it **silently falls back** to pre-computed JSON shipped with the repo (`data/precomputed/*.json`). When fallback fires, a banner is shown and an audit entry is written. This means: judges see real AI executing during their evaluation; the demo still works if the API is down or the key is missing.
87
-
88
- ### 4.4 OCR — three-tier pipeline (the robustness centerpiece)
89
- Bidder documents arrive in mixed formats (typed PDFs, scanned PDFs, photographs of certificates). The OCR pipeline handles each in increasing order of cost:
90
-
91
- | Tier | Engine | When it runs | Cost |
92
- |---|---|---|---|
93
- | 1 | PyMuPDF text extraction | Document is a typed PDF (detected via `is_text_pdf` heuristic) | Free, instant |
94
- | 2 | Tesseract (`pytesseract` + system binary) | Document is a scanned PDF or image | Free, fast, accuracy varies |
95
- | 3 | DeepSeek Vision LLM | Tesseract `mean_conf < 0.65` or extracted text suspiciously short | API call, slow, very accurate |
96
-
97
- Each extracted page records which tier produced it, and that provenance is shown in the UI ("Read by Tesseract @ 58% → re-read by Vision-LLM @ 95%"). This is more robust than single-engine OCR and is a real production pattern.
98
-
99
- ### 4.5 Vector store
100
- **ChromaDB** persistent client, embedded in-process, file-backed under `.chroma/`. Default embedding model is `all-MiniLM-L6-v2` from `sentence-transformers` (~80MB, downloaded on first run). Two collections: `tender_chunks`, `bidder_chunks` (filterable by `bidder_id`).
101
-
102
- ### 4.6 Audit log
103
- **SQLite** single-file DB (`audit.db`) with one append-only table `audit_log`.
104
-
105
- ### 4.7 Things explicitly cut
106
- - **LayoutLM** — too heavy for the build window. Robustness comes from the 3-tier OCR (vision LLM tier handles documents LayoutLM would otherwise cover).
107
- - **easyocr** — would add ~1GB (PyTorch). Vision-LLM tier replaces it.
108
- - **PostgreSQL** — SQLite is sufficient.
109
- - **React / Next.js / FastAPI split** — Streamlit alone meets all UI needs.
110
- - **Authentication / multi-user** — single hardcoded officer identity.
111
- - **Test infrastructure beyond a smoke test** — explicit time-budget decision.
112
- - **Map-reduce LLM extraction** — mock tender is ~5 pages, fits comfortably in V4's 1M context window in a single call.
113
-
114
- ---
115
-
116
- ## 5. Project Structure
117
-
118
- ```
119
- TenderIQ/
120
- ├── app.py # Streamlit entry point, tabs router
121
- ├── requirements.txt # pinned pip deps (section 12)
122
- ├── packages.txt # apt packages for Streamlit Cloud
123
- ├── .env.example # DEEPSEEK_API_KEY=
124
- ├── .gitignore # .env, .chroma/, audit.db, __pycache__, .ocr_cache/
125
- ├── README.md # run instructions (local + cloud)
126
- ├── ARCHITECTURE.md # diagram + flow (used as Custom Attachment)
127
- ├── IMPLEMENTATION_PLAN.md # this file
128
-
129
- ├── specs/ # per-module specs (created during build)
130
- │ ├── 01_config_and_schemas.md
131
- │ ├── 02_llm_client.md
132
- │ ├── 03_pdf_utils.md
133
- │ ├── 04_ocr_pipeline.md
134
- │ ├── 05_chunker.md
135
- │ ├── 06_vectorstore.md
136
- │ ├── 07_criteria_extractor.md
137
- │ ├── 08_bidder_processor.md
138
- │ ├── 09_evaluator.md
139
- │ ├── 10_audit_and_fallback.md
140
- │ ├── 11_mock_data.md
141
- │ ├── 12_precompute.md
142
- │ └── 13_ui_tabs.md
143
-
144
- ├── core/
145
- │ ├── __init__.py
146
- │ ├── config.py # env loading, model name, thresholds, paths
147
- │ ├── schemas.py # pydantic: Criterion, Evidence, Verdict, AuditEntry
148
- │ ├── prompts.py # EXTRACT_CRITERIA_PROMPT, EVALUATE_CRITERION_PROMPT, VISION_OCR_PROMPT
149
- │ ├── llm_client.py # DeepSeek wrapper: chat_json, chat_vision, LLMUnavailable
150
- │ ├── pdf_utils.py # PyMuPDF: extract_pages, is_text_pdf, render_page_to_image
151
- │ ├── ocr_pipeline.py # 3-tier OCR orchestrator
152
- │ ├── chunker.py # tender + bidder docs → chunks with metadata
153
- │ ├── vectorstore.py # ChromaDB persistent client + helpers
154
- │ ├── criteria_extractor.py # Stage 1: tender PDF → List[Criterion]
155
- │ ├── bidder_processor.py # Stage 2: bidder docs → indexed chunks + evidence retrieval
156
- │ ├── evaluator.py # Stage 3: per-criterion verdict with combined confidence
157
- │ ├── audit.py # SQLite audit log writer/reader
158
- │ └── fallback.py # load pre-computed JSON when live LLM fails
159
-
160
- ├── ui/
161
- │ ├── __init__.py
162
- │ ├── tab_overview.py # hero, architecture image, KPIs
163
- │ ├── tab_tender.py # upload tender → show criteria
164
- │ ├── tab_bidders.py # bidder evaluation table with verdicts + sources
165
- │ ├── tab_review.py # human review queue (Approve / Edit / Reject)
166
- │ ├── tab_audit.py # audit log table + CSV export
167
- │ └── components.py # verdict pill, confidence bar, citation chip, OCR-tier badge
168
-
169
- ├── data/
170
- │ ├── tender/
171
- │ │ └── crpf_construction_tender.pdf
172
- │ ├── bidders/
173
- │ │ ├── bidder_a/ # all eligible — typed PDFs
174
- │ │ ├── bidder_b/ # ineligible — turnover too low
175
- │ │ └── bidder_c/ # needs review — scanned turnover cert
176
- │ │ └── turnover_certificate_scan.png
177
- │ └── precomputed/ # fallback if live API fails
178
- │ ├── criteria.json
179
- │ ├── eval_bidder_a.json
180
- │ ├── eval_bidder_b.json
181
- │ └── eval_bidder_c.json
182
-
183
- ├── scripts/
184
- │ ├── generate_mock_data.py # reportlab → PDFs + PIL/numpy → noisy scan
185
- │ ├── precompute_results.py # run pipeline once, save fallback JSON
186
- │ └── smoke_test.py # programmatic end-to-end check
187
-
188
- ├── assets/
189
- │ ├── logo.png
190
- │ ├── architecture.png # for deck + Custom Attachment
191
- │ └── screenshots/ # 3-5 PNGs for submission
192
-
193
- └── deck/
194
- └── TenderIQ_Pitch.pdf # 8-slide pitch deck
195
- ```
196
-
197
- Runtime artifacts (gitignored): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`, `__pycache__/`.
198
-
199
- ---
200
-
201
- ## 6. Module Responsibilities
202
-
203
- This is the contract surface for each module. Each one will get its own spec document; the descriptions here are the seed material for those specs.
204
-
205
- ### `core/config.py`
206
- - Load `DEEPSEEK_API_KEY` from `st.secrets` first, then `.env` via `python-dotenv`.
207
- - Constants:
208
- - `DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"`
209
- - `MODEL_NAME = "deepseek-v4-pro"`
210
- - `MODEL_VERSION = "deepseek-v4-pro@<build-date>"` — used for audit stamping
211
- - `CONFIDENCE_HIGH = 0.80`
212
- - `CONFIDENCE_REVIEW = 0.55`
213
- - `OCR_TESSERACT_MIN_CONF = 0.65`
214
- - Paths: `DATA_DIR`, `CHROMA_DIR = ".chroma"`, `AUDIT_DB = "audit.db"`, `PRECOMPUTED_DIR`, `OCR_CACHE_DIR = ".ocr_cache"`.
215
-
216
- ### `core/schemas.py`
217
- Pydantic models matching the JSON shapes in section 7. At minimum: `Criterion`, `Rule`, `Evidence`, `Source`, `Verdict`, `AuditEntry`.
218
-
219
- ### `core/prompts.py`
220
- Three string constants — see section 8.
221
-
222
- ### `core/llm_client.py`
223
- ```
224
- class LLMUnavailable(Exception): ...
225
-
226
- class LLM:
227
- def __init__(self, api_key: str | None = None): ...
228
- def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict: ...
229
- def chat_vision(self, system: str, user_text: str, image: bytes | str | Path,
230
- max_retries: int = 2) -> str: ...
231
- ```
232
- - `chat_json` uses `response_format={"type": "json_object"}`, `temperature=0`, retries on JSON parse errors and 5xx with exponential backoff. Raises `LLMUnavailable` after `max_retries`.
233
- - `chat_vision` encodes the image as `data:image/png;base64,...` and sends a multimodal message in OpenAI-compatible format (`{"type": "image_url", "image_url": {"url": "..."}}`). Returns transcribed text. Raises `LLMUnavailable` on failure.
234
- - Every caller in `core/criteria_extractor.py`, `core/evaluator.py`, `core/ocr_pipeline.py` wraps calls in `try/except LLMUnavailable` and routes to `core/fallback.py` (or to a graceful low-confidence result for the OCR case).
235
-
236
- ### `core/pdf_utils.py`
237
- - `extract_pages(path: Path) -> list[dict]` — returns `[{"page": int, "text": str}]` via `fitz.open`.
238
- - `is_text_pdf(path: Path) -> bool` — heuristic on average chars per page.
239
- - `render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image` — for OCR.
240
-
241
- ### `core/ocr_pipeline.py`
242
- The robustness centerpiece. Orchestrates the three tiers described in section 4.4.
243
-
244
- ```
245
- def extract_document(file_path: Path) -> list[ExtractedPage]: ...
246
- ```
247
-
248
- `ExtractedPage` shape: `{"page": int, "text": str, "source_type": "text_pdf" | "tesseract" | "vision_llm", "confidence": float, "raw_tier_results": {"tesseract_conf": float | None, "vision_used": bool}}`.
249
-
250
- Logic:
251
- 1. If file is image (PNG/JPG): treat as 1-page; go straight to tier 2.
252
- 2. If file is PDF and `is_text_pdf == True`: tier 1 (text_pdf, conf=1.0).
253
- 3. Else: for each page render to image, run tier 2 (Tesseract via `pytesseract.image_to_data`), compute mean confidence excluding `-1`s, divided by 100.
254
- 4. If `mean_conf < OCR_TESSERACT_MIN_CONF` or text length absurdly short relative to image size: invoke tier 3 (`llm_client.chat_vision(VISION_OCR_PROMPT, image)`), set `source_type="vision_llm"`, `confidence=0.95`. Log `vision_ocr_invoked` audit entry.
255
- 5. If tier 3 raises `LLMUnavailable`: keep tier-2 result with `confidence < 0.65` (will trigger `needs_review` downstream).
256
- 6. Cache per-file results in `.ocr_cache/<file_hash>.json` so reruns don't re-OCR.
257
-
258
- ### `core/chunker.py`
259
- - `chunk_tender(pages: list[dict], tender_id: str) -> list[dict]` — ~500-token chunks per page, regex-detect clause headings (`^\d+(\.\d+)*\s+`).
260
- - `chunk_bidder(pages: list[ExtractedPage], bidder_id: str, doc_name: str) -> list[dict]` — page-level chunks (one per page; or per-doc if very short). Each chunk's metadata includes `bidder_id`, `doc_name`, `page`, `source_type`, `ocr_confidence`.
261
-
262
- ### `core/vectorstore.py`
263
- - `get_client()` cached with `@st.cache_resource`, returns `chromadb.PersistentClient(path=CHROMA_DIR)`.
264
- - `get_collection(name: str)` — creates if missing.
265
- - `add_chunks(collection, chunks: list[dict], metadatas: list[dict])` — ID = `hash(text)[:16]` to dedupe across reruns.
266
- - `query(collection, text: str, k: int = 4, where: dict | None = None) -> list[dict]` — returns `[{text, metadata, distance}, ...]`.
267
-
268
- ### `core/criteria_extractor.py`
269
- ```
270
- def extract_criteria(tender_pdf_path: Path) -> list[Criterion]: ...
271
- ```
272
- 1. `pdf_utils.extract_pages(tender_pdf_path)` → join all page text with `\n--- PAGE N ---\n` markers.
273
- 2. `llm.chat_json(EXTRACT_CRITERIA_PROMPT_SYSTEM, prompt + tender_text)`.
274
- 3. Parse JSON `{"criteria": [...]}`, validate via Pydantic, attach UUIDs if absent.
275
- 4. Index criteria text into the `tender_chunks` collection (for future retrieval / explainability features).
276
- 5. Return list. On `LLMUnavailable` → `fallback.load_criteria()` + audit `precomputed_fallback_used`.
277
-
278
- ### `core/bidder_processor.py`
279
- ```
280
- def process_bidder(bidder_id: str, files: list[Path]) -> None:
281
- """Extract, chunk, and index every file for this bidder."""
282
-
283
- def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
284
- """Retrieve top-k bidder chunks relevant to this criterion."""
285
- ```
286
- - Process step: each file → `ocr_pipeline.extract_document` → `chunker.chunk_bidder` → `vectorstore.add_chunks(bidder_chunks, ..., where={"bidder_id": bidder_id})`. Audit: `bidder_processed`.
287
- - Gather step: query string = `criterion.title + " " + " ".join(criterion.query_hints)`; `vectorstore.query(bidder_chunks, q, k=4, where={"bidder_id": bidder_id})`. Map results to `Evidence` objects.
288
-
289
- ### `core/evaluator.py`
290
- ```
291
- def evaluate(bidder_id: str, criterion: Criterion) -> Verdict: ...
292
- def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]: ...
293
- ```
294
-
295
- Algorithm for `evaluate`:
296
- 1. `evidence = bidder_processor.gather_evidence(bidder_id, criterion)`.
297
- 2. If `evidence` empty: return `Verdict(verdict="needs_review", reason="No matching evidence found in submitted documents.", llm_confidence=0, combined_confidence=0)` and audit. Done.
298
- 3. Call `llm.chat_json(EVALUATE_CRITERION_PROMPT_SYSTEM, render_user(criterion, evidence))`.
299
- 4. Parse: `{verdict, extracted_value, normalized_value, chosen_source, llm_confidence, reason}`.
300
- 5. Compute `combined_confidence` based on `chosen_source.source_type`:
301
- - `"text_pdf"`: `combined = llm_confidence`
302
- - `"vision_llm"`: `combined = 0.7 * llm_confidence + 0.3 * 0.95`
303
- - `"tesseract"`: `combined = 0.6 * llm_confidence + 0.4 * tesseract_conf`
304
- 6. Apply threshold rules (in order):
305
- - LLM verdict is `needs_review` → keep.
306
- - `combined >= 0.80` → keep LLM verdict.
307
- - `0.55 <= combined < 0.80` AND verdict is `not_eligible` → **downgrade to `needs_review`** (never silently disqualify).
308
- - `combined < 0.55` → force `needs_review`.
309
- 7. Build `Verdict` object, audit `criterion_evaluated`, return.
310
- 8. On `LLMUnavailable` → `fallback.load_evaluation(bidder_id, criterion.id)` + audit fallback.
311
-
312
- ### `core/audit.py`
313
- - SQLite single table:
314
- ```sql
315
- CREATE TABLE audit_log (
316
- id INTEGER PRIMARY KEY AUTOINCREMENT,
317
- ts TEXT NOT NULL,
318
- action TEXT NOT NULL,
319
- actor TEXT NOT NULL,
320
- model_version TEXT,
321
- bidder_id TEXT,
322
- criterion_id TEXT,
323
- payload_json TEXT
324
- );
325
- ```
326
- - `log(action: str, actor: str = "system", **fields) -> int` — inserts.
327
- - `query(filters: dict | None = None) -> list[dict]` — filterable by `bidder_id`, `action`, date range.
328
- - Action vocabulary: `criteria_extracted`, `bidder_processed`, `criterion_evaluated`, `human_review_action`, `precomputed_fallback_used`, `vision_ocr_invoked`.
329
- - Connection cached with `@st.cache_resource`.
330
-
331
- ### `core/fallback.py`
332
- - `load_criteria() -> list[Criterion]` — reads `data/precomputed/criteria.json`.
333
- - `load_evaluation(bidder_id: str, criterion_id: str) -> Verdict` — reads `data/precomputed/eval_bidder_<id>.json` and indexes into the `criterion_id` block.
334
- - Each fallback hit logs `precomputed_fallback_used` and sets `st.session_state["fallback_active"] = True` so the UI can render the banner.
335
-
336
- ---
337
-
338
- ## 7. Data Schemas
339
-
340
- All canonical, all serialized as JSON for storage and inter-module communication.
341
-
342
- ### `Criterion`
343
- ```json
344
- {
345
- "id": "C1",
346
- "title": "Minimum Annual Turnover",
347
- "category": "financial",
348
- "mandatory": true,
349
- "description": "Average annual turnover during the last three financial years shall not be less than INR 5 Crore.",
350
- "rule": {
351
- "type": "numeric_threshold",
352
- "field": "annual_turnover_inr",
353
- "operator": ">=",
354
- "value": 50000000,
355
- "unit": "INR"
356
- },
357
- "query_hints": ["annual turnover", "total revenue", "ITR", "audited financials"],
358
- "source_page": 3,
359
- "source_clause": "3.2(a)"
360
- }
361
- ```
362
- Fields:
363
- - `category`: `"financial" | "technical" | "compliance"`.
364
- - `rule.type`: `"numeric_threshold" | "count_threshold" | "certification_present" | "document_present"`.
365
- - `rule.operator`: `">=" | "<=" | "==" | "exists"`.
366
- - `query_hints`: 3–5 short noun phrases used to build retrieval queries.
367
-
368
- ### `Evidence` (one retrieved chunk during evaluation)
369
- ```json
370
- {
371
- "bidder_id": "bidder_a",
372
- "doc_name": "audited_financials.pdf",
373
- "page": 4,
374
- "text": "...annual turnover for FY 2024-25 was INR 6,20,00,000...",
375
- "source_type": "text_pdf",
376
- "ocr_confidence": null
377
- }
378
- ```
379
- - `source_type`: `"text_pdf" | "tesseract" | "vision_llm"`.
380
- - `ocr_confidence`: 0.0–1.0 if OCR was used; `null` for `text_pdf`.
381
-
382
- ### `Verdict`
383
- ```json
384
- {
385
- "verdict_id": "V-uuid",
386
- "bidder_id": "bidder_a",
387
- "criterion_id": "C1",
388
- "verdict": "eligible",
389
- "extracted_value": "INR 6.2 Cr",
390
- "normalized_value": 62000000,
391
- "source": {
392
- "doc_name": "audited_financials.pdf",
393
- "page": 4,
394
- "snippet": "...annual turnover... INR 6,20,00,000...",
395
- "source_type": "text_pdf"
396
- },
397
- "llm_confidence": 0.93,
398
- "ocr_confidence": null,
399
- "combined_confidence": 0.93,
400
- "reason": "Extracted turnover of INR 6.2 Cr exceeds the required threshold of INR 5 Cr.",
401
- "model_version": "deepseek-v4-pro@2026-05-07",
402
- "timestamp": "2026-05-07T12:34:56Z",
403
- "review_status": "pending"
404
- }
405
- ```
406
- - `verdict`: `"eligible" | "not_eligible" | "needs_review"`.
407
- - `review_status`: `"pending" | "approved" | "edited" | "rejected"`.
408
-
409
- ### `AuditEntry`
410
- Maps directly to the SQLite row (see `core/audit.py` description). The `payload_json` field carries the action-specific details (e.g., for `criterion_evaluated`: `{"verdict": "eligible", "combined_confidence": 0.93}`).
411
-
412
- ---
413
-
414
- ## 8. LLM Prompts
415
-
416
- All three prompts must demand strict JSON output where applicable, run at `temperature=0`, and rely on `response_format={"type": "json_object"}` for the JSON ones.
417
-
418
- ### `EXTRACT_CRITERIA_PROMPT`
419
- **System:**
420
- > You are an expert in Indian government tender analysis (CRPF context). Your job is to extract eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria not present in the text. Classify each criterion as mandatory or optional based on cue words: "shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", "may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints that an evaluator would search for in bidder documents.
421
-
422
- **User template:** the full tender text + a JSON schema example + the instruction:
423
- > Return `{"criteria": [Criterion, ...]}`. Each Criterion must include id (C1, C2, ...), title, category (financial / technical / compliance), mandatory (bool), description (verbatim or close paraphrase), rule (typed per the schema), query_hints, source_page (int), source_clause (string).
424
-
425
- ### `EVALUATE_CRITERION_PROMPT`
426
- **System:**
427
- > You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest single source. NEVER guess values not present in the evidence. If evidence is missing or ambiguous, return needs_review with reason. Output STRICT JSON.
428
-
429
- **User template** (variables substituted):
430
- ```
431
- CRITERION:
432
- { ...criterion JSON... }
433
-
434
- RETRIEVED EVIDENCE (top-k chunks from this bidder, with source + OCR confidence):
435
- [
436
- { "doc_name": "...", "page": 4, "ocr_confidence": null, "source_type": "text_pdf",
437
- "text": "..." },
438
- ...
439
- ]
440
-
441
- Return JSON:
442
- {
443
- "verdict": "eligible" | "not_eligible" | "needs_review",
444
- "extracted_value": "<short string as found>",
445
- "normalized_value": <number or null>,
446
- "chosen_source": {"doc_name": "...", "page": <int>, "snippet": "<<= 200 chars>", "source_type": "..."},
447
- "llm_confidence": <0..1>,
448
- "reason": "<one or two sentences>"
449
- }
450
-
451
- Rules:
452
- - If evidence directly contains a value satisfying the rule, verdict=eligible with high llm_confidence.
453
- - If evidence directly contradicts the rule, verdict=not_eligible.
454
- - If no relevant evidence retrieved, verdict=needs_review, llm_confidence<=0.4.
455
- - If the source is OCR with low confidence and the value is borderline, lean to needs_review.
456
- ```
457
-
458
- ### `VISION_OCR_PROMPT`
459
- **System:**
460
- > You are an OCR engine for Indian government procurement documents. Transcribe the image text faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — no commentary.
461
-
462
- **User text:** "Transcribe this document page completely. Pay special attention to numeric values like turnover figures (INR / Crore / Lakh), dates, and registration numbers." (Image attached.)
463
-
464
- ---
465
-
466
- ## 9. Build Order
467
-
468
- The order is chosen so that the system is **demoable after every major step**. Each numbered item is also the spec sequence — write the spec, get it reviewed, then implement.
469
-
470
- ### Step 1 — Skeleton (≈ 15 min)
471
- Folder structure, `requirements.txt`, `packages.txt`, `.env.example`, `.gitignore`, stub `app.py` with 5 empty Streamlit tabs and sidebar.
472
- **Spec:** `specs/00_skeleton.md` (light — mostly file list and stub contents).
473
- **Checkpoint:** `streamlit run app.py` shows the empty shell.
474
-
475
- ### Step 2 — Mock data generation (≈ 25 min)
476
- `scripts/generate_mock_data.py` produces tender PDF, three bidders' PDFs, and the noisy scan PNG (per section 10).
477
- **Spec:** `specs/11_mock_data.md`.
478
- **Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence.
479
-
480
- ### Step 3 — Config + schemas + prompts (≈ 25 min)
481
- `core/config.py`, `core/schemas.py`, `core/prompts.py`.
482
- **Spec:** `specs/01_config_and_schemas.md`.
483
-
484
- ### Step 4 — LLM client (≈ 25 min)
485
- `core/llm_client.py` with both `chat_json` and `chat_vision`. Smoke-test with a one-line script that calls each.
486
- **Spec:** `specs/02_llm_client.md`.
487
- **Checkpoint:** ad-hoc REPL call to `chat_json("hi", "respond with {\"ok\": true}")` returns `{"ok": True}`.
488
-
489
- ### Step 5 — PDF utils + chunker (≈ 15 min)
490
- `core/pdf_utils.py`, `core/chunker.py`.
491
- **Spec:** `specs/03_pdf_utils.md`, `specs/05_chunker.md` (can be combined).
492
-
493
- ### Step 6 — Criteria extractor + Tab 2 wiring (≈ 30 min)
494
- `core/criteria_extractor.py` + minimal `ui/tab_tender.py`.
495
- **Spec:** `specs/07_criteria_extractor.md`.
496
- **Checkpoint:** Tab 2 in the running app shows 5 criteria extracted from the mock tender.
497
-
498
- ### Step 7 — OCR pipeline (≈ 30 min)
499
- `core/ocr_pipeline.py`. Verify on `turnover_certificate_scan.png`.
500
- **Spec:** `specs/04_ocr_pipeline.md`.
501
- **Checkpoint:** running `extract_document(turnover_certificate_scan.png)` first attempts Tesseract (low conf), then falls through to vision-LLM, returns `source_type="vision_llm"` with the correct turnover figure.
502
-
503
- ### Step 8 — Vector store + bidder processor (≈ 25 min)
504
- `core/vectorstore.py`, `core/bidder_processor.py`.
505
- **Spec:** `specs/06_vectorstore.md`, `specs/08_bidder_processor.md`.
506
- **Checkpoint:** `process_bidder("bidder_a", ...)` indexes all five docs; `gather_evidence("bidder_a", turnover_criterion)` returns top-4 chunks, the strongest mentioning "INR 6,20,00,000".
507
-
508
- ### Step 9 — Evaluator + threshold logic (≈ 25 min)
509
- `core/evaluator.py`.
510
- **Spec:** `specs/09_evaluator.md`.
511
- **Checkpoint:** `evaluate("bidder_a", turnover_criterion)` returns verdict=eligible, combined_confidence ≥ 0.8; `evaluate("bidder_b", turnover_criterion)` returns verdict=not_eligible.
512
-
513
- ### Step 10 — Audit + fallback (≈ 20 min)
514
- `core/audit.py`, `core/fallback.py`.
515
- **Spec:** `specs/10_audit_and_fallback.md`.
516
-
517
- ### Step 11 — Pre-compute results (≈ 15 min)
518
- `scripts/precompute_results.py` runs the full pipeline, dumps `criteria.json` + `eval_bidder_*.json`. Commit results.
519
- **Spec:** `specs/12_precompute.md`.
520
- **Checkpoint:** four JSON files exist and validate against the schemas.
521
-
522
- ### Step 12 — UI tabs (≈ 80 min total)
523
- - Tab 3 — Bidder evaluation (35 min): rows with verdict pills, source chips, OCR-tier badges, confidence bars, expandable Reason and Source Snippet.
524
- - Tab 4 — Review queue (15 min): filtered list of `needs_review` rows with Approve/Edit/Reject.
525
- - Tab 5 — Audit log (15 min): sortable table + CSV export.
526
- - Tab 1 — Overview (15 min): hero, architecture image, KPIs, "Use Pre-loaded Demo" CTA.
527
-
528
- `ui/components.py` is built incrementally as Tabs 3 and 4 need it.
529
- **Spec:** `specs/13_ui_tabs.md` (covers all five tabs and `components.py`).
530
-
531
- ### Step 13 — Smoke test + README (≈ 15 min)
532
- `scripts/smoke_test.py` (programmatic full flow), `README.md`.
533
-
534
- ### Step 14 — Streamlit Cloud deploy (≈ 25 min)
535
- Push to GitHub, connect Streamlit Cloud, set `DEEPSEEK_API_KEY` in app secrets, verify deployed URL works in incognito with API and again with the key removed (precomputed mode).
536
-
537
- ### Step 15 — Submission package (≈ 90 min)
538
- Architecture diagram, 8-slide deck, 4 screenshots, 2-min demo video (OBS / Win+G), zip source, fill submission form.
539
-
540
- ---
541
-
542
- ## 10. Mock Data Strategy
543
-
544
- Single deterministic script `scripts/generate_mock_data.py`, runs in <30 seconds.
545
-
546
- ### Tender PDF — `data/tender/crpf_construction_tender.pdf`
547
- `reportlab` SimpleDocTemplate, 5–6 pages with these sections: (1) Introduction, (2) Scope of Work, (3) Eligibility Criteria, (4) Submission Procedure, (5) Evaluation Methodology, (6) Annexures. Section 3 contains five criteria phrased in formal tender language (this is the theme's sample scenario verbatim, so judges will recognize it):
548
-
549
- | ID | Clause | Text | Mandatory? | Category |
550
- |---|---|---|---|---|
551
- | C1 | 3.2(a) | "...minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years..." | Yes | financial |
552
- | C2 | 3.2(b) | "...successfully completed at least three (3) similar construction projects in the last five (5) financial years..." | Yes | technical |
553
- | C3 | 3.2(c) | "...shall possess a valid Goods and Services Tax (GST) registration..." | Yes | compliance |
554
- | C4 | 3.2(d) | "...shall hold a valid ISO 9001:2015 Quality Management System certification..." | Yes | compliance |
555
- | C5 | 3.2(e) | "...preferably, the bidder may have prior experience with paramilitary infrastructure..." | **No** | technical |
556
-
557
- C5 tests the mandatory-vs-optional classification.
558
-
559
- ### Bidder A (clearly eligible) — typed PDFs only
560
- `company_profile.pdf`, `audited_financials.pdf` (FY 22-23: ₹5.8 Cr, 23-24: ₹6.2 Cr, 24-25: ₹7.1 Cr), `project_experience.pdf` (5 projects in 5 years), `gst_certificate.pdf` (GSTIN, valid 2027), `iso_9001.pdf` (valid 2027).
561
-
562
- ### Bidder B (clearly ineligible — turnover too low) — typed PDFs only
563
- Same docs as A but `audited_financials.pdf` shows ₹1.2 / ₹1.5 / ₹1.8 Cr (all below threshold). Other criteria pass.
564
-
565
- ### Bidder C (needs review — scanned turnover certificate) — typed + one scan
566
- Typed `company_profile.pdf`, `project_experience.pdf` (3 projects — borderline meets count), `gst_certificate.pdf`, `iso_9001.pdf`.
567
-
568
- **`turnover_certificate_scan.png`** generation:
569
- 1. Render a `reportlab` page with the CA's turnover statement.
570
- 2. Convert to `PIL.Image` via `pillow`.
571
- 3. Apply: `ImageFilter.GaussianBlur(radius=1.5)`, salt-and-pepper noise via `numpy`, `image.rotate(-2, fillcolor="white")`, JPEG-compress at quality=40, save as PNG.
572
- 4. Outcome: Tesseract reads it with mean confidence ~50–65% → triggers Tier-3 vision LLM. Vision LLM transcribes correctly; combined-confidence rule still routes Bidder C to `needs_review` (this is intended — it demonstrates the safety rule).
573
-
574
- ### Pre-computed fallback files — `data/precomputed/`
575
- After the pipeline modules are working, run `scripts/precompute_results.py` once to produce:
576
- - `criteria.json` — output of `extract_criteria(tender_pdf)`.
577
- - `eval_bidder_a.json`, `eval_bidder_b.json`, `eval_bidder_c.json` — per-bidder verdicts for all criteria.
578
-
579
- Commit these four files to the repo. They are the safety net for live demos.
580
-
581
- ---
582
-
583
- ## 11. Streamlit UI
584
-
585
- 5 tabs, left-to-right narrative order:
586
-
587
- ### Tab 1 — Overview
588
- Hero text ("TenderIQ — explainable AI for tender evaluation"), architecture image (`assets/architecture.png`), 4 KPI cards (criteria extracted, bidders evaluated, hours saved, audit entries). "Use Pre-loaded Demo Data" (default) and "Upload Your Own" CTA.
589
-
590
- ### Tab 2 — Tender Analysis
591
- File uploader (defaults to mock tender preview). Button **"Extract Criteria (Live LLM)"** runs `criteria_extractor`. Results render as cards with category badge (color-coded), mandatory pill, description, source-page chip. Cached to `st.session_state["criteria"]`.
592
-
593
- ### Tab 3 — Bidder Evaluation
594
- Bidder multi-select (defaults all 3). Button **"Run Evaluation"** processes each bidder × each criterion. Output: rows with verdict pill (green/red/amber), extracted value, source chip (doc + page + **OCR-tier badge** showing `text_pdf` / `tesseract` / `vision_llm`), confidence bar, expandable Reason and Source Snippet. Per-bidder summary header: "X / 4 mandatory criteria met — Overall: Eligible / Not Eligible / Needs Review".
595
-
596
- ### Tab 4 — Human Review Queue
597
- Filtered to verdicts where `review_status == "pending"` AND `verdict == "needs_review"`. Each row: criterion, bidder, extracted value (editable), confidence, reason, source snippet, image preview if OCR'd. Buttons: Approve / Edit & Approve / Reject — each writes audit entry and updates `review_status`.
598
-
599
- ### Tab 5 — Audit Log
600
- Sortable table from `audit.query()`. Filter by bidder, action type. CSV export.
601
-
602
- ### Sidebar (always visible)
603
- Logo, project name, **DeepSeek connection status dot**:
604
- - Green: live connection, no fallback fired this session.
605
- - Amber: fallback fired at least once this session.
606
- - Red: probe at startup failed.
607
- "Reset Session" button. If `st.session_state["fallback_active"]`, show banner: "⚠ Live API unavailable — showing pre-computed results."
608
-
609
- ---
610
-
611
- ## 12. requirements.txt and packages.txt
612
-
613
- `requirements.txt` (pinned):
614
- ```
615
- streamlit==1.39.0
616
- openai==1.51.0
617
- pymupdf==1.24.10
618
- pytesseract==0.3.13
619
- Pillow==10.4.0
620
- numpy==1.26.4
621
- chromadb==0.5.5
622
- sentence-transformers==3.1.1
623
- pydantic==2.9.2
624
- python-dotenv==1.0.1
625
- reportlab==4.2.5
626
- pandas==2.2.3
627
- ```
628
-
629
- `packages.txt` (apt packages for Streamlit Cloud):
630
- ```
631
- tesseract-ocr
632
- poppler-utils
633
- ```
634
-
635
- ---
636
-
637
- ## 13. Risks and Mitigations
638
-
639
- | Risk | Mitigation |
640
- |---|---|
641
- | **DeepSeek API down or rate-limited mid-demo.** | Live-first with silent fallback to `data/precomputed/*.json`. Sidebar dot turns amber. App keeps working. |
642
- | **Tesseract install on Streamlit Cloud.** | `packages.txt` with `tesseract-ocr`. If it still fails: Tier-3 vision LLM works on raw image input, and `data/precomputed/eval_bidder_c.json` is the final safety net. |
643
- | **DeepSeek vision call (Tier 3) fails.** | Tesseract result accepted with `confidence < 0.65` → flows to `needs_review`. Demo still works. |
644
- | **ChromaDB first-run sentence-transformers download (~80 MB).** | `@st.cache_resource` on the client. README warns "first cloud load may take ~30s". Pre-warm by visiting deployed URL once before submission. |
645
- | **LLM returns malformed JSON.** | `response_format={"type":"json_object"}` + 2 retries with stricter system prompt → fall back to precomputed for that item. |
646
- | **PyMuPDF licensing.** | AGPL but allowed for hackathon use; pin `pymupdf==1.24.10`; mention in README. |
647
- | **API key leak in repo.** | `.env` gitignored; `.env.example` ships with placeholder; Streamlit Cloud secrets used in deploy; pre-commit visual diff check. |
648
- | **Time overrun.** | Compression order: skip Tab 1 KPIs → skip optional 5th criterion → skip CSV export → keep core flow (Tabs 2–4) intact for the video. |
649
-
650
- ---
651
-
652
- ## 14. Verification (run before recording the demo video)
653
-
654
- Treat this as the acceptance test. The demo video should walk through these steps in order.
655
-
656
- 1. **Cold start.** Delete `.chroma/`, `audit.db`. Run `streamlit run app.py`. App opens in <10s; Tab 1 renders.
657
- 2. **Live extraction.** Tab 2 → "Extract Criteria" → 5 criteria appear within 10–20s. Sidebar dot green.
658
- 3. **Live evaluation, Bidder A.** Tab 3 → select Bidder A → "Run Evaluation". All 4 mandatory criteria → `eligible` with combined confidence ≥ 0.80.
659
- 4. **Live evaluation, Bidder B.** Turnover criterion → `not_eligible` with reason citing low turnover figure and source page.
660
- 5. **Live evaluation, Bidder C — the OCR demo path.** Turnover criterion → triggers Tier 2 (Tesseract low conf) → triggers Tier 3 (DeepSeek Vision). UI shows "Read by Tesseract @ ~58% → Vision-LLM @ 95%". Final verdict: `needs_review`. Audit log gains a `vision_ocr_invoked` entry.
661
- 6. **Review action.** Tab 4 → click "Approve" on Bidder C's turnover row → audit log gains `human_review_action` entry within 1 second; `review_status` updates.
662
- 7. **Audit export.** Tab 5 → "Export CSV" → CSV downloads with all entries.
663
- 8. **No-API run.** Rename `.env` (or unset secret), restart app → all "Run Live" buttons silently fall back to precomputed, banner shown, sidebar dot amber, audit gets `precomputed_fallback_used` entries.
664
- 9. **Smoke test.** `python scripts/smoke_test.py` exits 0.
665
- 10. **Deployed URL.** Open Streamlit Cloud URL in incognito; repeat steps 1–6.
666
-
667
- ---
668
-
669
- ## 15. Submission Deliverables (Round 2 form fields)
670
-
671
- Mapping of submission requirements to artifacts:
672
-
673
- | Form field | Artifact |
674
- |---|---|
675
- | Title | "TenderIQ — Explainable AI for Tender Evaluation" |
676
- | Description | Adapted from `idea.md` |
677
- | Parent Submission | The shortlisted Round 1 idea |
678
- | Theme | Theme 3 |
679
- | Snapshots | `assets/screenshots/*.png` |
680
- | Video URL | YouTube unlisted link to 2-min demo |
681
- | Presentation | `deck/TenderIQ_Pitch.pdf` |
682
- | Demo Link | Streamlit Cloud URL |
683
- | Repository URL | GitHub URL |
684
- | Source Code | Zip of repo (excluding `.env`, `.chroma/`, `audit.db`) |
685
- | Instructions to Run | `README.md` quickstart |
686
- | Custom Attachment | `ARCHITECTURE.md` exported as PDF (with the architecture diagram embedded) |
687
-
688
- ---
689
-
690
- ## 16. Definition of Done
691
-
692
- The build is done when **all** of the following are true:
693
-
694
- - [ ] All 10 verification steps in section 14 pass.
695
- - [ ] Streamlit Cloud URL is live and reachable.
696
- - [ ] GitHub repo is public, with `.env` not committed.
697
- - [ ] `README.md` quickstart works on a fresh clone with no API key (precomputed mode).
698
- - [ ] Pitch deck, demo video, screenshots, and architecture PDF are produced.
699
- - [ ] Submission form is filled and submitted.
700
- - [ ] Memory note saved with deployment URL and submission timestamp.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
idea.md DELETED
@@ -1,157 +0,0 @@
1
- # TenderIQ: Explainable AI Platform for Automated Tender Evaluation & Eligibility Analysis
2
-
3
- **Phase:** Idea Phase (Shortlisted)
4
- **Last updated:** Apr 30, 2026
5
- **Theme:** Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
6
-
7
- ---
8
-
9
- ## Problem Understanding
10
-
11
- Government tender evaluation today is a manual, time-consuming, and error-prone process. Procurement officers must review large volumes of unstructured documents — including PDFs, scanned files, and images — to verify whether bidders meet eligibility criteria such as financial thresholds, technical experience, and compliance certifications.
12
-
13
- This results in:
14
- - Inconsistent evaluations across reviewers
15
- - High turnaround time (often days per tender)
16
- - Lack of transparency and auditability
17
- - Risk of oversight in critical compliance checks
18
-
19
- Our solution addresses these challenges by transforming unstructured tender and bidder data into structured, explainable, and auditable decisions.
20
-
21
- ---
22
-
23
- ## Proposed Solution: TenderIQ
24
-
25
- TenderIQ is an AI-powered platform designed to automate tender evaluation while ensuring human trust, explainability, and audit readiness. The system follows a four-stage pipeline:
26
-
27
- ### Stage 1: Tender Understanding (Criteria Extraction)
28
-
29
- The platform extracts eligibility criteria from tender documents using a hybrid approach combining LLMs and rule-based parsing. It identifies:
30
- - Financial conditions (e.g., turnover ≥ ₹5 Cr)
31
- - Technical requirements (e.g., project experience)
32
- - Compliance rules (e.g., GST registration, ISO certifications)
33
-
34
- Each criterion is:
35
- - Classified as mandatory or optional
36
- - Converted into a structured, machine-readable format
37
-
38
- ### Stage 2: Bidder Document Processing
39
-
40
- The system processes heterogeneous bidder submissions, including:
41
- - Typed PDFs
42
- - Scanned documents
43
- - Images
44
- - Word files
45
-
46
- The processing pipeline includes:
47
- - OCR for scanned documents and images
48
- - Layout-aware parsing for tables, forms, and certificates
49
- - Entity extraction for key values such as turnover, certifications, and project count
50
-
51
- All extracted information is stored along with:
52
- - Source reference (document and page number)
53
- - Confidence score
54
-
55
- ### Stage 3: Evaluation and Decision Engine
56
-
57
- Each bidder is evaluated on a criterion-by-criterion basis using:
58
- - Rule-based validation (e.g., threshold checks)
59
- - Confidence-aware scoring
60
-
61
- The system produces three possible outcomes:
62
- - **Eligible**
63
- - **Not Eligible**
64
- - **Needs Manual Review**
65
-
66
- Ambiguous or low-confidence cases are never automatically rejected. Instead, they are flagged for human review to ensure fairness and compliance.
67
-
68
- ### Stage 4: Explainability and Audit Layer (Key Differentiator)
69
-
70
- Every decision is fully explainable and traceable. Each evaluation includes:
71
- - The criterion being checked
72
- - The extracted value
73
- - Source document reference
74
- - Confidence score
75
- - Reason for the decision
76
-
77
- **Example:**
78
- ```
79
- Criterion: Minimum Turnover ≥ ₹5 Cr
80
- Extracted Value: ₹6.2 Cr
81
- Source: Financial Statement (Page 4)
82
- Confidence: 92%
83
- Verdict: Eligible
84
- ```
85
-
86
- All system actions are logged with:
87
- - Model version
88
- - Timestamp
89
- - Reviewer actions
90
-
91
- This ensures complete end-to-end auditability suitable for government procurement processes.
92
-
93
- ---
94
-
95
- ## Human-in-the-Loop Workflow
96
-
97
- The system incorporates a mandatory human review layer:
98
- - Low-confidence or conflicting cases are routed to reviewers
99
- - The interface highlights extracted data directly within documents
100
- - Reviewers can: Approve, Edit, or Reject decisions
101
- - All reviewer decisions are captured and used to improve system performance over time
102
-
103
- ---
104
-
105
- ## Key Features
106
-
107
- - Handles scanned and unstructured documents effectively
108
- - Provides criterion-level explainability for every decision
109
- - Ensures no silent disqualification of bidders
110
- - Maintains a fully auditable decision pipeline
111
- - Scales across departments and tender types
112
-
113
- ---
114
-
115
- ## Technology Stack
116
-
117
- | Layer | Technology |
118
- |---|---|
119
- | AI/ML | LLMs for extraction, OCR (Tesseract or PaddleOCR), LayoutLM for document understanding |
120
- | Backend | Python (FastAPI) with rule-based evaluation engine |
121
- | Storage | PostgreSQL and vector database for document retrieval |
122
- | Frontend | React-based dashboard |
123
-
124
- ---
125
-
126
- ## Risks and Mitigation
127
-
128
- | Risk | Mitigation |
129
- |---|---|
130
- | OCR inaccuracies | Confidence scoring and human review |
131
- | Legal language ambiguity | Hybrid LLM and rule-based parsing |
132
- | Data inconsistency across documents | Conflict detection and validation logic |
133
- | Over-automation risk | Human-in-the-loop validation |
134
-
135
- ---
136
-
137
- ## Why This Solution Stands Out
138
-
139
- - Balances automation with accountability
140
- - Designed specifically for government procurement constraints
141
- - Focuses on trust, explainability, and auditability
142
- - Works effectively with real-world, messy data formats
143
-
144
- ---
145
-
146
- ## Future Scope (Round 2)
147
-
148
- - Integration with existing procurement systems
149
- - Model improvement through feedback loops
150
- - Multi-language document support
151
- - Advanced fraud detection in bidder submissions
152
-
153
- ---
154
-
155
- ## Core Philosophy
156
-
157
- The system prioritizes **assistive intelligence over full automation**, ensuring that every decision is explainable, reviewable, and compliant with government procurement standards.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
presentation_creation.md DELETED
@@ -1,689 +0,0 @@
1
- # TenderIQ — Presentation Creation Brief
2
-
3
- > **Purpose of this file:** Give a fresh Claude context everything it needs to generate
4
- > 5–6 distinct, high-quality presentations (PPT and/or PDF) for the CRPF Hackathon
5
- > submission. The creator should produce all variants in one session so the user can
6
- > pick the best one.
7
-
8
- ---
9
-
10
- ## 0. How to use this file
11
-
12
- 1. Read sections 1–5 carefully — they contain all project context, slide content,
13
- and data.
14
- 2. Read section 6 — it defines exactly 6 visual styles to produce.
15
- 3. Read section 7 — it gives technical guidance for python-pptx and reportlab.
16
- 4. Produce all 6 variants, saving them to `deck/` as:
17
- - `deck/TenderIQ_v1_dark_professional.pptx`
18
- - `deck/TenderIQ_v2_clean_minimal.pptx`
19
- - `deck/TenderIQ_v3_government_official.pptx`
20
- - `deck/TenderIQ_v4_modern_gradient.pdf`
21
- - `deck/TenderIQ_v5_data_forward.pptx`
22
- - `deck/TenderIQ_v6_infographic.pdf`
23
- 5. Each variant uses the **same slide content** (section 4) but different visual
24
- treatment (section 6). Do not cut content between variants.
25
-
26
- **DO NOT** reuse the existing `deck/TenderIQ_Pitch.pdf` — it was generated by a
27
- previous low-quality script and should be ignored entirely.
28
-
29
- ---
30
-
31
- ## 1. Project Summary
32
-
33
- **Name:** TenderIQ
34
- **Tagline:** Explainable AI for Government Tender Evaluation
35
- **Event:** CRPF Hackathon — Theme 3: AI-Based Tender Evaluation and Eligibility
36
- Analysis for Government Procurement
37
- **Organisation:** Central Reserve Police Force, Ministry of Home Affairs,
38
- Government of India
39
-
40
- **One paragraph description:**
41
- TenderIQ automates the eligibility evaluation of bidders against government tender
42
- criteria. A procurement officer uploads a tender PDF; the system extracts each
43
- eligibility criterion using an LLM, processes bidder documents through a three-tier
44
- OCR pipeline (handling everything from typed PDFs to blurry scanned certificates),
45
- evaluates each bidder against each criterion with combined confidence scoring, and
46
- surfaces ambiguous cases for human review — all with a complete, exportable audit
47
- trail. The app is built on Streamlit and is deployable to a public URL in minutes.
48
-
49
- ---
50
-
51
- ## 2. The Problem (use these facts on the problem slide)
52
-
53
- - A procurement committee manually reading tender documents and bidder submissions
54
- can spend **3–5 days per tender**
55
- - Two evaluators reviewing the same bid **regularly reach different conclusions**
56
- - Documents arrive in **mixed formats**: typed PDFs, scanned certificates,
57
- photographs of documents taken on phones
58
- - There is **no consistent audit trail** — decisions cannot be traced to specific
59
- evidence
60
- - Government procurement is worth **₹50 lakh crore+ annually** in India
61
- - Manual evaluation is a bottleneck that **delays project execution**
62
-
63
- ---
64
-
65
- ## 3. Key Differentiators (highlight these)
66
-
67
- 1. **Three-tier OCR robustness** — most systems assume digital text; TenderIQ
68
- handles scanned and photographed documents via a progressive pipeline:
69
- PyMuPDF (typed PDF, instant) → Tesseract OCR (scans) → DeepSeek Vision LLM
70
- (low-confidence scans, ~95% accuracy). Every page records which tier read it.
71
-
72
- 2. **Never silent disqualification** — the safety rule: if combined confidence is
73
- between 0.55 and 0.80 and the verdict is `not_eligible`, it is automatically
74
- downgraded to `needs_review`. A bidder is never automatically disqualified at
75
- medium confidence.
76
-
77
- 3. **Criterion-level explainability** — every verdict is traceable to a specific
78
- document, page number, OCR tier, extracted value, and plain-English reason.
79
- Not just "pass/fail" — the officer can see exactly why.
80
-
81
- 4. **Complete audit trail** — every action (extraction, OCR invocation, evaluation,
82
- human review) is logged with timestamp, model version, actor, and payload to
83
- SQLite. Exportable as CSV.
84
-
85
- 5. **Works without internet** — pre-computed fallback JSON is shipped with the
86
- repo. If the API goes down during a demo, the app continues seamlessly. Sidebar
87
- turns amber to indicate fallback mode.
88
-
89
- ---
90
-
91
- ## 4. Slide Content (use exactly this for all 6 variants)
92
-
93
- ### SLIDE 1 — Title Slide
94
- - **Main title:** TenderIQ
95
- - **Subtitle:** Explainable AI for Government Tender Evaluation
96
- - **Event line:** CRPF Hackathon · Theme 3
97
- - **Tagline (small):** From days to minutes. Every decision traceable.
98
- - **Visual suggestion:** The ⚖️ scales emoji large, or an abstract representation
99
- of documents flowing through a pipeline
100
-
101
- ---
102
-
103
- ### SLIDE 2 — The Problem
104
- - **Title:** The Problem with Manual Tender Evaluation
105
- - **Three pain points (use as large visual callouts or icon+text cards):**
106
- 1. **3–5 Days** per tender evaluation by committee
107
- 2. **Inconsistent** — two evaluators, two different conclusions
108
- 3. **No audit trail** — decisions untraceably made
109
- - **Supporting points (smaller text or bullets):**
110
- - Mixed document formats: typed PDFs, scans, phone photographs
111
- - Government procurement worth ₹50 lakh crore+ annually in India
112
- - Project delays traced directly to procurement bottlenecks
113
-
114
- ---
115
-
116
- ### SLIDE 3 — Our Solution
117
- - **Title:** TenderIQ — Four Stages, End to End
118
- - **Four stage cards (equal weight, horizontal or 2×2 layout):**
119
-
120
- **Stage 1 — Extract**
121
- DeepSeek LLM reads the tender PDF and returns each criterion as structured JSON:
122
- category, mandatory flag, threshold rule, source clause, query hints.
123
-
124
- **Stage 2 — OCR & Index**
125
- Three-tier pipeline handles any document format.
126
- All text chunked and indexed for semantic retrieval.
127
-
128
- **Stage 3 — Evaluate**
129
- Vector search finds relevant evidence. LLM produces a verdict with confidence.
130
- Safety rule prevents silent disqualification.
131
-
132
- **Stage 4 — Review & Audit**
133
- Borderline cases go to a human review queue. Every action logged.
134
- Full audit trail exportable as CSV.
135
-
136
- - **Bottom line (callout box):**
137
- "Minutes, not days. Every verdict traceable to a document and page."
138
-
139
- ---
140
-
141
- ### SLIDE 4 — Architecture
142
- - **Title:** System Architecture
143
- - **Diagram description (reproduce this as a visual flow diagram):**
144
-
145
- ```
146
- Tender PDF Bidder Documents
147
- │ (PDFs · scans · photos)
148
- ▼ │
149
- DeepSeek LLM 3-Tier OCR Pipeline
150
- (Extract Criteria) ① PyMuPDF (typed PDF)
151
- │ ② Tesseract (scans)
152
- ▼ ③ Vision LLM (low conf.)
153
- Criteria JSON │
154
- (C1–C5 structured) Vector Index (in-memory)
155
- │ all-MiniLM-L6-v2 embeddings
156
- └──────────────────────────────────┘
157
-
158
- DeepSeek LLM
159
- (Evaluate each criterion)
160
- combined confidence score
161
-
162
- ┌─────────────┴──────────────┐
163
- eligible / needs_review
164
- not_eligible Human Review Queue
165
- │ │
166
- └───────── SQLite Audit Log ────────┘
167
- ```
168
-
169
- - **Key technical facts (sidebar or footnotes on slide):**
170
- - Single-process Streamlit app — no separate backend
171
- - Deployable to Streamlit Cloud or HuggingFace Spaces
172
- - All storage is local: SQLite + in-memory vector index
173
- - Only external dependency: DeepSeek API
174
-
175
- ---
176
-
177
- ### SLIDE 5 — The OCR Demo (the centrepiece)
178
- - **Title:** Three-Tier OCR — Handling Any Document Format
179
- - **Three tier cards with visual progression:**
180
-
181
- **Tier 1 — PyMuPDF**
182
- - Trigger: Document is a typed/digital PDF
183
- - Cost: Free, instant
184
- - Confidence: 1.0 (lossless text extraction)
185
- - Source label in UI: 📄 Typed PDF
186
-
187
- **Tier 2 — Tesseract**
188
- - Trigger: Scanned PDF or image file
189
- - Cost: Free, local, fast
190
- - Confidence: Mean of per-word OCR scores
191
- - Source label in UI: 🔍 Tesseract
192
-
193
- **Tier 3 — DeepSeek Vision LLM**
194
- - Trigger: Tesseract confidence < 65%
195
- - Cost: One API call
196
- - Confidence: 0.95
197
- - Source label in UI: 👁 Vision LLM
198
- - Action: `vision_ocr_invoked` logged to audit
199
-
200
- - **Demo scenario callout (use a highlighted box):**
201
- > **Bidder C submits a blurry, rotated CA certificate scan.**
202
- > Tesseract reads it at ~55% confidence.
203
- > Vision LLM transcribes the turnover figure correctly.
204
- > Combined confidence = 0.58 → routed to human review.
205
- > This is intentional — borderline evidence requires a human.
206
-
207
- ---
208
-
209
- ### SLIDE 6 — Explainability & Compliance
210
- - **Title:** Every Decision is Explainable and Auditable
211
- - **Two columns:**
212
-
213
- **Left — Criterion-Level Verdicts**
214
- Each (bidder × criterion) pair shows:
215
- - Which criterion was checked
216
- - Which document and page provided the evidence
217
- - What value was extracted (e.g. "INR 6.2 Cr")
218
- - Which OCR tier read the document
219
- - Combined confidence score (0–100%)
220
- - Plain-English reason
221
-
222
- **Right — Audit Trail**
223
- Every action logged with:
224
- - UTC timestamp
225
- - Action type (criteria_extracted / bidder_processed / criterion_evaluated /
226
- human_review_action / vision_ocr_invoked / precomputed_fallback_used)
227
- - Model version
228
- - Actor (system / officer)
229
- - Full payload JSON
230
- - Exportable as CSV
231
-
232
- - **Safety rule callout (prominent, in a coloured box):**
233
- > **The Safety Rule:**
234
- > If combined confidence is 0.55–0.80 AND verdict is `not_eligible`,
235
- > the verdict is automatically downgraded to `needs_review`.
236
- > A bidder is **never silently disqualified** at medium confidence.
237
-
238
- ---
239
-
240
- ### SLIDE 7 — Demo Results
241
- - **Title:** Demo: Three Bidders, Three Outcomes
242
- - **Three bidder cards side by side:**
243
-
244
- **Bidder A — Apex Constructions Pvt. Ltd.**
245
- Result: ✅ ELIGIBLE
246
- - C1 Turnover: INR 6.37 Cr avg (threshold: 5 Cr) — PASS
247
- - C2 Projects: 5 completed including CRPF barracks �� PASS
248
- - C3 GST: GSTIN 27AABCA1234F1Z5, Active — PASS
249
- - C4 ISO 9001:2015: Valid June 2027 — PASS
250
- - All typed PDFs, confidence ≥ 93% on all criteria
251
-
252
- **Bidder B — BuildRight Enterprises**
253
- Result: ❌ NOT ELIGIBLE
254
- - C1 Turnover: INR 1.5 Cr avg (threshold: 5 Cr) — FAIL
255
- "Average annual turnover of INR 1.5 Cr is below the required
256
- minimum of INR 5 Cr."
257
- - C2–C4: All pass
258
- - Automatically disqualified with high confidence (95%)
259
-
260
- **Bidder C — Shree Constructions & Services**
261
- Result: ⚠️ NEEDS REVIEW
262
- - C1 Turnover: Submitted as blurry scan
263
- Tesseract ~55% → Vision LLM transcribes INR 5.4 Cr
264
- Combined confidence 0.58 → needs review (safety rule)
265
- - C2: Exactly 3 projects (borderline)
266
- - C3–C4: Pass
267
-
268
- - **Bottom metric strip:**
269
- | Metric | Value |
270
- |--------|-------|
271
- | Criteria extracted | 5 |
272
- | Bidder documents processed | 15 |
273
- | LLM evaluation calls | 15 |
274
- | Vision OCR invocations | 1 |
275
- | Human review items | 1 |
276
- | Total audit entries | 20+ |
277
-
278
- ---
279
-
280
- ### SLIDE 8 — Tech Stack & Future Work
281
- - **Title:** Stack, Impact & What's Next
282
- - **Left side — Tech Stack (as a clean table):**
283
-
284
- | Component | Technology |
285
- |-----------|------------|
286
- | UI & orchestration | Streamlit 1.39 |
287
- | LLM | DeepSeek API (OpenAI-compatible) |
288
- | OCR Tier 1 | PyMuPDF 1.24 |
289
- | OCR Tier 2 | Tesseract |
290
- | OCR Tier 3 | DeepSeek Vision LLM |
291
- | Semantic retrieval | sentence-transformers all-MiniLM-L6-v2 |
292
- | Data validation | Pydantic v2 |
293
- | Audit log | SQLite |
294
- | Deployment | Streamlit Cloud / HuggingFace Spaces |
295
-
296
- - **Right side — Future Work (as bullets):**
297
- - Multi-tender workspace — same bidder pool, multiple tenders
298
- - GeM portal API integration — live tender ingestion
299
- - Automated bidder ranking with weighted scoring
300
- - LayoutLM for complex financial tables in scanned statements
301
- - Multi-evaluator workflow with role-based approval
302
- - Review queue email/SMS notifications
303
- - Audit PDF export for procurement oversight submissions
304
-
305
- - **Bottom — Impact callout:**
306
- > **3–5 days → minutes.**
307
- > Every verdict traceable to a document, page, and model version.
308
- > Built in one hackathon session. Deployable today.
309
-
310
- ---
311
-
312
- ## 5. Narrative Arc (how the slides tell a story)
313
-
314
- The deck should flow as:
315
- 1. **Hook** (Slide 1) — big, confident title
316
- 2. **Pain** (Slide 2) — make the problem visceral with the 3 numbers
317
- 3. **Solution** (Slide 3) — 4 clean stages, not overwhelming
318
- 4. **Credibility** (Slide 4) — architecture shows it's real engineering
319
- 5. **Differentiator** (Slide 5) — the OCR story is unique and concrete
320
- 6. **Trust** (Slide 6) — explainability + audit builds confidence with judges
321
- 7. **Proof** (Slide 7) — real outcomes, real numbers, real bidder scenarios
322
- 8. **Vision** (Slide 8) — grounded stack + forward-looking future work
323
-
324
- Every slide should have **one dominant visual** and **limited text**. Judges skim.
325
- The most important information should be readable in 3 seconds.
326
-
327
- ---
328
-
329
- ## 6. The Six Visual Styles
330
-
331
- Produce one presentation per style. All use the same slide content from section 4.
332
-
333
- ---
334
-
335
- ### Style 1 — Dark Professional (PPTX)
336
- **File:** `deck/TenderIQ_v1_dark_professional.pptx`
337
-
338
- **Palette:**
339
- - Slide background: `#0D1B2A` (deep navy)
340
- - Primary text: `#F1F5F9` (near white)
341
- - Secondary text: `#94A3B8` (muted blue-grey)
342
- - Accent / headings: `#F0A500` (gold)
343
- - Eligible green: `#22C55E`
344
- - Not eligible red: `#EF4444`
345
- - Needs review amber: `#F59E0B`
346
- - Card backgrounds: `#1E3A5F` (lighter navy)
347
- - Borders: `#2D4A6B`
348
-
349
- **Typography:**
350
- - Headings: Calibri Bold or Arial Bold, 28–32pt, gold
351
- - Body: Calibri or Arial, 16–18pt, near-white
352
- - Captions/labels: 12–13pt, muted blue-grey
353
-
354
- **Style rules:**
355
- - Dark background on every slide
356
- - Title slide: large gold ⚖️ emoji, gold title text on navy
357
- - Section headings have a thin gold left border or underline
358
- - Cards/boxes: slightly lighter navy background (#1E3A5F) with gold border
359
- - Verdict chips: coloured filled rectangles (green/red/amber) with white text
360
- - Progress/confidence: horizontal bar in gold on dark track
361
- - The OCR tier cards (Slide 5): three columns, each with a different accent colour
362
- (blue for Tier 1, purple for Tier 2, orange for Tier 3)
363
- - Architecture diagram (Slide 4): use white-on-dark text boxes connected with
364
- gold arrows
365
-
366
- ---
367
-
368
- ### Style 2 — Clean Minimal (PPTX)
369
- **File:** `deck/TenderIQ_v2_clean_minimal.pptx`
370
-
371
- **Palette:**
372
- - Slide background: `#FFFFFF`
373
- - Primary text: `#111827`
374
- - Secondary text: `#6B7280`
375
- - Accent: `#2563EB` (blue)
376
- - Light accent background: `#EFF6FF`
377
- - Eligible: `#059669`
378
- - Not eligible: `#DC2626`
379
- - Needs review: `#D97706`
380
- - Dividers/borders: `#E5E7EB`
381
-
382
- **Typography:**
383
- - Headings: Inter or Calibri Light Bold, 28–32pt, #111827
384
- - Body: Inter or Calibri, 15–16pt, #374151
385
- - Captions: 11–12pt, #9CA3AF
386
-
387
- **Style rules:**
388
- - White background throughout
389
- - Large amounts of whitespace — never fill the slide
390
- - Title slide: small ⚖️ followed by large "TenderIQ" in #111827, subtitle in grey
391
- - Section headings: simple left-aligned text with a 3px blue left border
392
- - Cards: white with a 1px #E5E7EB border and very subtle shadow (simulate with
393
- slightly off-white fill)
394
- - No gradients, no heavy fills — colour used sparingly as accent only
395
- - Verdict chips: light fill (green/red/amber at 15% opacity) with bold coloured text
396
- - Numbers/stats (Slide 2): very large (80–96pt), blue accent colour, minimal
397
- surrounding text
398
- - Architecture diagram (Slide 4): use grey boxes with blue connector arrows,
399
- clean and uncluttered
400
-
401
- ---
402
-
403
- ### Style 3 — Government Official (PPTX)
404
- **File:** `deck/TenderIQ_v3_government_official.pptx`
405
-
406
- **Palette:**
407
- - Primary: `#003580` (deep government blue, similar to NIC / India.gov.in)
408
- - Secondary: `#FFFFFF`
409
- - Accent: `#FF9933` (saffron, from the Indian tricolour)
410
- - Third accent: `#138808` (India green)
411
- - Background: `#F5F5F0` (off-white, like a government document)
412
- - Text: `#1A1A1A`
413
- - Borders: `#003580`
414
-
415
- **Typography:**
416
- - Headings: Times New Roman Bold or Cambria Bold, 26–30pt, #003580
417
- - Body: Arial or Calibri, 14–15pt, #1A1A1A
418
- - Official labels: small caps, 11pt, #003580
419
-
420
- **Style rules:**
421
- - Header bar on every slide: deep blue (#003580) band at top with white text for
422
- slide title; thin saffron line below the header band
423
- - Footer on every slide: "TenderIQ · CRPF Hackathon · Theme 3" in small text
424
- on the blue header colour
425
- - Title slide: formal layout — emblem/logo area top left, large title centred,
426
- "Ministry of Home Affairs" sub-line
427
- - Slide content area: off-white background, clean margins
428
- - Tables: blue header row (#003580, white text), alternating white/#F0F4FF rows
429
- - Callout boxes: thin blue border, very light blue fill (#EBF0FF)
430
- - Verdict indicators: use formal language labels ("ELIGIBLE", "NOT ELIGIBLE",
431
- "UNDER REVIEW") in coloured text, no emoji
432
- - This style should feel like an official government presentation, not a startup deck
433
-
434
- ---
435
-
436
- ### Style 4 — Modern Gradient (PDF via reportlab)
437
- **File:** `deck/TenderIQ_v4_modern_gradient.pdf`
438
-
439
- **Palette:**
440
- - Gradient 1 (title slide): `#667EEA` → `#764BA2` (purple-blue)
441
- - Gradient 2 (content slides background strip): `#0EA5E9` → `#2563EB`
442
- - Card fills: `#FFFFFF` with coloured top accent border
443
- - Text on gradient: `#FFFFFF`
444
- - Text on white: `#0F172A`
445
- - Eligible: `#10B981`
446
- - Not eligible: `#F43F5E`
447
- - Needs review: `#FBBF24`
448
-
449
- **Typography:**
450
- - Headings on gradient: white, bold, 24–28pt
451
- - Body on white cards: dark, 12–14pt
452
- - Stat numbers: 48–56pt, gradient-coloured
453
-
454
- **Style rules (reportlab specific):**
455
- - Title slide: full-page gradient background (use `canvas.linearGradient` if
456
- available, or approximate with filled rectangles stepping from #667EEA to #764BA2)
457
- - Content slides: white background with a gradient-filled header band (top 20% of
458
- slide) for the slide title
459
- - Cards: white rectangles with a 4px top border in a theme colour, subtle grey
460
- border on other sides
461
- - Stat numbers on Slide 2: very large, rendered in the gradient colours
462
- - OCR tiers on Slide 5: three cards with top borders in blue, purple, orange
463
- - Arrows in architecture diagram: use curved lines in gradient blue
464
- - Avoid heavy outlines — use fill and spacing instead
465
- - Page numbers bottom right in muted colour
466
-
467
- ---
468
-
469
- ### Style 5 — Data Forward (PPTX)
470
- **File:** `deck/TenderIQ_v5_data_forward.pptx`
471
-
472
- **Palette:**
473
- - Background: `#FAFAFA`
474
- - Primary: `#1E293B`
475
- - Accent: `#6366F1` (indigo)
476
- - Chart colours: `#22C55E`, `#EF4444`, `#F59E0B`, `#3B82F6`, `#8B5CF6`
477
- - Grid lines: `#E2E8F0`
478
- - Text: `#334155`
479
-
480
- **Typography:**
481
- - Data labels: 14–16pt bold, #1E293B
482
- - Axis labels / captions: 10–11pt, #64748B
483
- - Slide titles: 24pt bold, indigo
484
-
485
- **Style rules:**
486
- - This variant leads with data visualisation on every slide where possible
487
- - Slide 2 (Problem): use a simple bar chart showing "Days per tender" comparison
488
- (manual: 3–5 days, TenderIQ: minutes represented as <0.1 days)
489
- - Slide 3 (Solution): use a horizontal process flow with numbered circles
490
- - Slide 5 (OCR): use a stacked bar or table showing accuracy by tier
491
- - Slide 7 (Demo results): use a verdicts breakdown chart — 3 bidders × 5 criteria
492
- as a colour-coded matrix (green/red/amber cells)
493
- - Slide 8 (Stack): use a visual table with technology icons (text-based approximation)
494
- - Charts should be built with python-pptx chart objects (not images) where possible,
495
- or use matplotlib to embed PNG charts
496
-
497
- **Key chart specs:**
498
- - Demo results matrix (Slide 7): 3 rows (bidders) × 5 columns (criteria), each cell
499
- filled green/red/amber with a 1-letter code (E/N/R)
500
- - OCR confidence comparison (Slide 5): simple bar chart showing
501
- Tier 1: 100%, Tier 2: ~55–65%, Tier 3: ~95%
502
- - Problem scale (Slide 2): two-bar chart, Manual vs TenderIQ, logarithmic scale
503
- or just text-anchored bars
504
-
505
- ---
506
-
507
- ### Style 6 — Infographic (PDF via reportlab)
508
- **File:** `deck/TenderIQ_v6_infographic.pdf`
509
-
510
- **Palette:**
511
- - Background: `#FFFFFF`
512
- - Section stripe: `#F8FAFC`
513
- - Primary icon colour: `#2563EB`
514
- - Icon accents: `#22C55E`, `#EF4444`, `#F59E0B`, `#8B5CF6`
515
- - Text: `#0F172A`
516
- - Subtext: `#64748B`
517
-
518
- **Typography:**
519
- - Large numbers: 48–60pt, bold, primary colour
520
- - Section labels: 10pt, all-caps, letter-spaced, muted
521
- - Body: 12pt, dark
522
-
523
- **Style rules:**
524
- - Every slide is built around a large central icon or number
525
- - Slide 2 (Problem): three large numbers (3–5, ✗, ?) each with a one-line label
526
- - Slide 3 (Solution): four large icons (📄 → 🔍 → ⚖️ → 📋) with stage labels
527
- - Slide 4 (Architecture): a vertical flow infographic, not a box diagram —
528
- icon per stage, connecting lines, short labels
529
- - Slide 5 (OCR): three large tier icons stacked with an arrow between them,
530
- confidence % shown as a circular progress indicator (drawn with arc)
531
- - Slide 7 (Demo): three large outcome icons (✅ ❌ ⚠️) each with 3 bullet points
532
- - Slide 8 (Future): icon grid of 6 future directions, each with a 1-line label
533
- - No heavy borders — whitespace is the separator
534
- - Use reportlab's `canvas.drawString`, arcs for circular indicators, and
535
- filled rectangles for bars
536
-
537
- ---
538
-
539
- ## 7. Technical Implementation Notes
540
-
541
- ### python-pptx (for Styles 1, 2, 3, 5)
542
-
543
- ```python
544
- from pptx import Presentation
545
- from pptx.util import Inches, Pt, Emu
546
- from pptx.dml.color import RGBColor
547
- from pptx.enum.text import PP_ALIGN
548
- from pptx.util import Inches, Pt
549
-
550
- # Slide size — widescreen 16:9
551
- prs = Presentation()
552
- prs.slide_width = Inches(13.33)
553
- prs.slide_height = Inches(7.5)
554
-
555
- # Add a blank slide
556
- slide_layout = prs.slide_layouts[6] # blank
557
- slide = prs.slides.add_slide(slide_layout)
558
-
559
- # Add a filled rectangle
560
- from pptx.util import Inches
561
- shape = slide.shapes.add_shape(
562
- MSO_SHAPE_TYPE.RECTANGLE, # or use 1
563
- Inches(0), Inches(0), Inches(13.33), Inches(7.5)
564
- )
565
- shape.fill.solid()
566
- shape.fill.fore_color.rgb = RGBColor(0x0D, 0x1B, 0x2A)
567
- shape.line.fill.background() # no border
568
-
569
- # Add text box
570
- from pptx.util import Inches, Pt
571
- txBox = slide.shapes.add_textbox(Inches(1), Inches(2), Inches(11), Inches(2))
572
- tf = txBox.text_frame
573
- tf.word_wrap = True
574
- p = tf.paragraphs[0]
575
- p.text = "TenderIQ"
576
- p.alignment = PP_ALIGN.CENTER
577
- run = p.runs[0]
578
- run.font.size = Pt(54)
579
- run.font.bold = True
580
- run.font.color.rgb = RGBColor(0xF0, 0xA5, 0x00) # gold
581
-
582
- # Save
583
- prs.save("deck/TenderIQ_v1_dark_professional.pptx")
584
- ```
585
-
586
- **Key python-pptx patterns to use:**
587
- - `slide.shapes.add_shape(1, ...)` — adds a rectangle (MSO_SHAPE_TYPE.RECTANGLE = 1)
588
- - `shape.fill.solid()` + `shape.fill.fore_color.rgb = RGBColor(r, g, b)` — fill colour
589
- - `shape.line.fill.background()` — remove border
590
- - `slide.shapes.add_textbox(left, top, width, height)` — text box
591
- - `tf.paragraphs[0].runs[0].font.color.rgb` — font colour
592
- - `slide.shapes.add_picture(image_path, left, top, width, height)` — embed image
593
- - For tables: `slide.shapes.add_table(rows, cols, left, top, width, height)`
594
- - All measurements: use `Inches()` or `Pt()` or raw `Emu` (1 inch = 914400 Emu)
595
-
596
- **Avoid:**
597
- - `pptx.chart` (complex, often renders poorly) — use coloured shapes instead
598
- - Embedded images from URLs — use only local files or draw shapes
599
-
600
- ---
601
-
602
- ### reportlab (for Styles 4 and 6)
603
-
604
- ```python
605
- from reportlab.pdfgen.canvas import Canvas
606
- from reportlab.lib.pagesizes import A4, landscape
607
- from reportlab.lib import colors
608
- from reportlab.lib.units import cm, mm
609
-
610
- W, H = landscape(A4) # 841.89 x 595.28 points (29.7 x 21 cm landscape)
611
-
612
- c = Canvas("deck/TenderIQ_v4_modern_gradient.pdf", pagesize=landscape(A4))
613
-
614
- # Filled rectangle
615
- c.setFillColor(colors.HexColor("#0D1B2A"))
616
- c.rect(0, 0, W, H, fill=1, stroke=0)
617
-
618
- # Text
619
- c.setFillColor(colors.white)
620
- c.setFont("Helvetica-Bold", 48)
621
- c.drawCentredString(W/2, H/2, "TenderIQ")
622
-
623
- # Line
624
- c.setStrokeColor(colors.HexColor("#F0A500"))
625
- c.setLineWidth(3)
626
- c.line(2*cm, H - 3*cm, W - 2*cm, H - 3*cm)
627
-
628
- # New page
629
- c.showPage()
630
-
631
- # Save
632
- c.save()
633
- ```
634
-
635
- **Key reportlab patterns:**
636
- - `c.rect(x, y, w, h, fill=1, stroke=0)` — filled rectangle, no border
637
- - `c.roundRect(x, y, w, h, radius, fill=1, stroke=0)` — rounded rectangle
638
- - `c.drawCentredString(x, y, text)` — centred text at point
639
- - `c.drawString(x, y, text)` — left-aligned text
640
- - `c.drawRightString(x, y, text)` — right-aligned text
641
- - `c.setFont("Helvetica-Bold", size)` — font (built-in: Helvetica, Times-Roman, Courier)
642
- - `c.arc(x1, y1, x2, y2, startAng, extent)` — arc (for circular indicators)
643
- - `c.line(x1, y1, x2, y2)` — line
644
- - `c.showPage()` — new slide/page
645
- - **Coordinate system:** origin is bottom-left; y increases upward
646
- - To position from top: use `H - y_from_top`
647
-
648
- **Text wrapping in reportlab:**
649
- ```python
650
- from reportlab.platypus import Paragraph
651
- from reportlab.lib.styles import ParagraphStyle
652
-
653
- style = ParagraphStyle('body', fontSize=12, leading=16, textColor=colors.white)
654
- p = Paragraph("Long text that wraps automatically.", style)
655
- p.wrapOn(c, width, height)
656
- p.drawOn(c, x, y)
657
- ```
658
-
659
- ---
660
-
661
- ## 8. Quality Checklist
662
-
663
- Before saving each variant, verify:
664
-
665
- - [ ] All 8 slides are present with content from section 4
666
- - [ ] Title is readable in 2 seconds on slide 1
667
- - [ ] The three pain point numbers are prominent on slide 2
668
- - [ ] The safety rule callout is visually distinct on slide 6
669
- - [ ] The three bidder outcomes are clearly colour-coded on slide 7
670
- - [ ] No slide has walls of text — maximum ~6 bullet points per slide
671
- - [ ] Font sizes: headings 24–32pt, body 14–16pt minimum (readable when projected)
672
- - [ ] Consistent margin — at least 1 inch (reportlab: 2.5cm) from all edges
673
- - [ ] Consistent colour palette within each variant (no accidental colour mixing)
674
- - [ ] File saves without error and opens cleanly
675
-
676
- ---
677
-
678
- ## 9. Output Summary
679
-
680
- | File | Format | Style | Tool |
681
- |------|--------|-------|------|
682
- | `deck/TenderIQ_v1_dark_professional.pptx` | PPTX | Dark navy + gold | python-pptx |
683
- | `deck/TenderIQ_v2_clean_minimal.pptx` | PPTX | White + blue, minimal | python-pptx |
684
- | `deck/TenderIQ_v3_government_official.pptx` | PPTX | Government blue + saffron | python-pptx |
685
- | `deck/TenderIQ_v4_modern_gradient.pdf` | PDF | Purple-blue gradient | reportlab |
686
- | `deck/TenderIQ_v5_data_forward.pptx` | PPTX | Charts + data viz | python-pptx |
687
- | `deck/TenderIQ_v6_infographic.pdf` | PDF | Large icons + numbers | reportlab |
688
-
689
- All output to `deck/`. Delete `TenderIQ_Pitch.pdf` (the old bad one) after creating the new files.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/00_skeleton.md DELETED
@@ -1,594 +0,0 @@
1
- # Spec 00 — Project Skeleton
2
-
3
- **Step:** 1 of 15
4
- **Time budget:** ~15 min
5
- **Checkpoint:** `streamlit run app.py` opens in the browser showing 5 named tabs and a sidebar with logo placeholder, project name, and connection status dot. No errors in the terminal.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Create every file and directory that Step 2 onward will write into. All Python modules are stubs (importable but empty of logic). The running app must render without crashing.
12
-
13
- ---
14
-
15
- ## Files to Create
16
-
17
- ### Root-level files
18
-
19
- #### `requirements.txt`
20
- ```
21
- streamlit==1.39.0
22
- openai==1.51.0
23
- pymupdf==1.24.10
24
- pytesseract==0.3.13
25
- Pillow==10.4.0
26
- numpy==1.26.4
27
- chromadb==0.5.5
28
- sentence-transformers==3.1.1
29
- pydantic==2.9.2
30
- python-dotenv==1.0.1
31
- reportlab==4.2.5
32
- pandas==2.2.3
33
- ```
34
-
35
- #### `packages.txt`
36
- ```
37
- tesseract-ocr
38
- poppler-utils
39
- ```
40
-
41
- #### `.env.example`
42
- ```
43
- DEEPSEEK_API_KEY=your_key_here
44
- ```
45
-
46
- #### `.gitignore`
47
- ```
48
- .env
49
- .chroma/
50
- audit.db
51
- __pycache__/
52
- *.pyc
53
- .ocr_cache/
54
- *.egg-info/
55
- dist/
56
- build/
57
- .DS_Store
58
- Thumbs.db
59
- ```
60
-
61
- #### `app.py` — Streamlit entry point (stub)
62
-
63
- Exact stub content:
64
-
65
- ```python
66
- import streamlit as st
67
-
68
- from ui.tab_overview import render as render_overview
69
- from ui.tab_tender import render as render_tender
70
- from ui.tab_bidders import render as render_bidders
71
- from ui.tab_review import render as render_review
72
- from ui.tab_audit import render as render_audit
73
-
74
- st.set_page_config(
75
- page_title="TenderIQ",
76
- page_icon="⚖️",
77
- layout="wide",
78
- )
79
-
80
- # ── Sidebar ──────────────────────────────────────────────────────────────────
81
- with st.sidebar:
82
- st.markdown("## ⚖️ TenderIQ")
83
- st.caption("Explainable AI for Tender Evaluation")
84
- st.divider()
85
- # Connection status — placeholder until core/llm_client.py is wired
86
- st.markdown("🔴 **DeepSeek:** not connected")
87
- st.divider()
88
- if st.button("Reset Session", use_container_width=True):
89
- for key in list(st.session_state.keys()):
90
- del st.session_state[key]
91
- st.rerun()
92
-
93
- # ── Tabs ─────────────────────────────────────────────────────────────────────
94
- tab1, tab2, tab3, tab4, tab5 = st.tabs([
95
- "Overview",
96
- "Tender Analysis",
97
- "Bidder Evaluation",
98
- "Human Review",
99
- "Audit Log",
100
- ])
101
-
102
- with tab1:
103
- render_overview()
104
-
105
- with tab2:
106
- render_tender()
107
-
108
- with tab3:
109
- render_bidders()
110
-
111
- with tab4:
112
- render_review()
113
-
114
- with tab5:
115
- render_audit()
116
- ```
117
-
118
- ---
119
-
120
- ### `core/` package — all stubs
121
-
122
- Every file in `core/` must be importable and expose the names that `app.py` or other modules reference at import time. No logic yet — just `pass` stubs and placeholder class/function signatures.
123
-
124
- #### `core/__init__.py`
125
- Empty.
126
-
127
- #### `core/config.py`
128
- ```python
129
- import os
130
- from pathlib import Path
131
- from dotenv import load_dotenv
132
-
133
- load_dotenv()
134
-
135
- DEEPSEEK_API_KEY: str | None = os.getenv("DEEPSEEK_API_KEY")
136
- DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"
137
- MODEL_NAME = "deepseek-chat"
138
- MODEL_VERSION = f"{MODEL_NAME}@2026-05-07"
139
-
140
- CONFIDENCE_HIGH = 0.80
141
- CONFIDENCE_REVIEW = 0.55
142
- OCR_TESSERACT_MIN_CONF = 0.65
143
-
144
- BASE_DIR = Path(__file__).resolve().parent.parent
145
- DATA_DIR = BASE_DIR / "data"
146
- CHROMA_DIR = str(BASE_DIR / ".chroma")
147
- AUDIT_DB = str(BASE_DIR / "audit.db")
148
- PRECOMPUTED_DIR = DATA_DIR / "precomputed"
149
- OCR_CACHE_DIR = BASE_DIR / ".ocr_cache"
150
- ```
151
-
152
- #### `core/schemas.py`
153
- ```python
154
- from __future__ import annotations
155
- from typing import Literal, Optional
156
- from pydantic import BaseModel, Field
157
- import uuid
158
-
159
-
160
- class Rule(BaseModel):
161
- type: Literal["numeric_threshold", "count_threshold", "certification_present", "document_present"]
162
- field: str
163
- operator: Literal[">=", "<=", "==", "exists"]
164
- value: float | int | None = None
165
- unit: str | None = None
166
-
167
-
168
- class Criterion(BaseModel):
169
- id: str
170
- title: str
171
- category: Literal["financial", "technical", "compliance"]
172
- mandatory: bool
173
- description: str
174
- rule: Rule
175
- query_hints: list[str]
176
- source_page: int
177
- source_clause: str
178
-
179
-
180
- class Evidence(BaseModel):
181
- bidder_id: str
182
- doc_name: str
183
- page: int
184
- text: str
185
- source_type: Literal["text_pdf", "tesseract", "vision_llm"]
186
- ocr_confidence: float | None = None
187
-
188
-
189
- class Source(BaseModel):
190
- doc_name: str
191
- page: int
192
- snippet: str
193
- source_type: Literal["text_pdf", "tesseract", "vision_llm"]
194
-
195
-
196
- class Verdict(BaseModel):
197
- verdict_id: str = Field(default_factory=lambda: f"V-{uuid.uuid4().hex[:8]}")
198
- bidder_id: str
199
- criterion_id: str
200
- verdict: Literal["eligible", "not_eligible", "needs_review"]
201
- extracted_value: str | None = None
202
- normalized_value: float | int | None = None
203
- source: Source | None = None
204
- llm_confidence: float = 0.0
205
- ocr_confidence: float | None = None
206
- combined_confidence: float = 0.0
207
- reason: str = ""
208
- model_version: str = ""
209
- timestamp: str = ""
210
- review_status: Literal["pending", "approved", "edited", "rejected"] = "pending"
211
-
212
-
213
- class AuditEntry(BaseModel):
214
- id: int | None = None
215
- ts: str
216
- action: str
217
- actor: str
218
- model_version: str | None = None
219
- bidder_id: str | None = None
220
- criterion_id: str | None = None
221
- payload_json: str | None = None
222
- ```
223
-
224
- #### `core/prompts.py`
225
- ```python
226
- EXTRACT_CRITERIA_PROMPT_SYSTEM = """\
227
- You are an expert in Indian government tender analysis (CRPF context). Your job is to extract \
228
- eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria \
229
- not present in the text. Classify each criterion as mandatory or optional based on cue words: \
230
- "shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", \
231
- "may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints \
232
- that an evaluator would search for in bidder documents.\
233
- """
234
-
235
- EVALUATE_CRITERION_PROMPT_SYSTEM = """\
236
- You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from \
237
- a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest \
238
- single source. NEVER guess values not present in the evidence. If evidence is missing or \
239
- ambiguous, return needs_review with reason. Output STRICT JSON.\
240
- """
241
-
242
- VISION_OCR_PROMPT_SYSTEM = """\
243
- You are an OCR engine for Indian government procurement documents. Transcribe the image text \
244
- faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use \
245
- markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — \
246
- no commentary.\
247
- """
248
-
249
- VISION_OCR_USER = (
250
- "Transcribe this document page completely. Pay special attention to numeric values like "
251
- "turnover figures (INR / Crore / Lakh), dates, and registration numbers."
252
- )
253
- ```
254
-
255
- #### `core/llm_client.py`
256
- ```python
257
- from pathlib import Path
258
-
259
-
260
- class LLMUnavailable(Exception):
261
- pass
262
-
263
-
264
- class LLM:
265
- def __init__(self, api_key: str | None = None):
266
- pass
267
-
268
- def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict:
269
- raise NotImplementedError
270
-
271
- def chat_vision(
272
- self,
273
- system: str,
274
- user_text: str,
275
- image: bytes | str | Path,
276
- max_retries: int = 2,
277
- ) -> str:
278
- raise NotImplementedError
279
- ```
280
-
281
- #### `core/pdf_utils.py`
282
- ```python
283
- from pathlib import Path
284
- import PIL.Image
285
-
286
-
287
- def extract_pages(path: Path) -> list[dict]:
288
- raise NotImplementedError
289
-
290
-
291
- def is_text_pdf(path: Path) -> bool:
292
- raise NotImplementedError
293
-
294
-
295
- def render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image.Image:
296
- raise NotImplementedError
297
- ```
298
-
299
- #### `core/ocr_pipeline.py`
300
- ```python
301
- from pathlib import Path
302
-
303
-
304
- class ExtractedPage:
305
- page: int
306
- text: str
307
- source_type: str # "text_pdf" | "tesseract" | "vision_llm"
308
- confidence: float
309
- raw_tier_results: dict
310
-
311
-
312
- def extract_document(file_path: Path) -> list[ExtractedPage]:
313
- raise NotImplementedError
314
- ```
315
-
316
- #### `core/chunker.py`
317
- ```python
318
- from core.ocr_pipeline import ExtractedPage
319
-
320
-
321
- def chunk_tender(pages: list[dict], tender_id: str) -> list[dict]:
322
- raise NotImplementedError
323
-
324
-
325
- def chunk_bidder(
326
- pages: list[ExtractedPage], bidder_id: str, doc_name: str
327
- ) -> list[dict]:
328
- raise NotImplementedError
329
- ```
330
-
331
- #### `core/vectorstore.py`
332
- ```python
333
- def get_client():
334
- raise NotImplementedError
335
-
336
-
337
- def get_collection(name: str):
338
- raise NotImplementedError
339
-
340
-
341
- def add_chunks(collection, chunks: list[dict], metadatas: list[dict]) -> None:
342
- raise NotImplementedError
343
-
344
-
345
- def query(
346
- collection, text: str, k: int = 4, where: dict | None = None
347
- ) -> list[dict]:
348
- raise NotImplementedError
349
- ```
350
-
351
- #### `core/criteria_extractor.py`
352
- ```python
353
- from pathlib import Path
354
- from core.schemas import Criterion
355
-
356
-
357
- def extract_criteria(tender_pdf_path: Path) -> list[Criterion]:
358
- raise NotImplementedError
359
- ```
360
-
361
- #### `core/bidder_processor.py`
362
- ```python
363
- from pathlib import Path
364
- from core.schemas import Criterion, Evidence
365
-
366
-
367
- def process_bidder(bidder_id: str, files: list[Path]) -> None:
368
- raise NotImplementedError
369
-
370
-
371
- def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
372
- raise NotImplementedError
373
- ```
374
-
375
- #### `core/evaluator.py`
376
- ```python
377
- from core.schemas import Criterion, Verdict
378
-
379
-
380
- def evaluate(bidder_id: str, criterion: Criterion) -> Verdict:
381
- raise NotImplementedError
382
-
383
-
384
- def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]:
385
- raise NotImplementedError
386
- ```
387
-
388
- #### `core/audit.py`
389
- ```python
390
- def log(action: str, actor: str = "system", **fields) -> int:
391
- raise NotImplementedError
392
-
393
-
394
- def query(filters: dict | None = None) -> list[dict]:
395
- raise NotImplementedError
396
- ```
397
-
398
- #### `core/fallback.py`
399
- ```python
400
- from core.schemas import Criterion, Verdict
401
-
402
-
403
- def load_criteria() -> list[Criterion]:
404
- raise NotImplementedError
405
-
406
-
407
- def load_evaluation(bidder_id: str, criterion_id: str) -> Verdict:
408
- raise NotImplementedError
409
- ```
410
-
411
- ---
412
-
413
- ### `ui/` package — all stubs
414
-
415
- Each tab module exports a single `render()` function that renders a placeholder heading. No logic.
416
-
417
- #### `ui/__init__.py`
418
- Empty.
419
-
420
- #### `ui/tab_overview.py`
421
- ```python
422
- import streamlit as st
423
-
424
- def render() -> None:
425
- st.header("Overview")
426
- st.info("Coming soon — architecture diagram, KPIs, and demo CTA.")
427
- ```
428
-
429
- #### `ui/tab_tender.py`
430
- ```python
431
- import streamlit as st
432
-
433
- def render() -> None:
434
- st.header("Tender Analysis")
435
- st.info("Coming soon — upload tender and extract eligibility criteria.")
436
- ```
437
-
438
- #### `ui/tab_bidders.py`
439
- ```python
440
- import streamlit as st
441
-
442
- def render() -> None:
443
- st.header("Bidder Evaluation")
444
- st.info("Coming soon — per-bidder, per-criterion verdict table.")
445
- ```
446
-
447
- #### `ui/tab_review.py`
448
- ```python
449
- import streamlit as st
450
-
451
- def render() -> None:
452
- st.header("Human Review Queue")
453
- st.info("Coming soon — approve / edit / reject flagged verdicts.")
454
- ```
455
-
456
- #### `ui/tab_audit.py`
457
- ```python
458
- import streamlit as st
459
-
460
- def render() -> None:
461
- st.header("Audit Log")
462
- st.info("Coming soon — sortable audit log with CSV export.")
463
- ```
464
-
465
- #### `ui/components.py`
466
- ```python
467
- # Shared UI widgets — implemented incrementally as Tab 3 and Tab 4 need them.
468
- ```
469
-
470
- ---
471
-
472
- ### `data/` directory structure (empty folders only)
473
-
474
- ```
475
- data/
476
- tender/
477
- bidders/
478
- bidder_a/
479
- bidder_b/
480
- bidder_c/
481
- precomputed/
482
- ```
483
-
484
- No files yet — Step 2 (mock data generation) populates these.
485
-
486
- ---
487
-
488
- ### `scripts/` directory (empty stubs)
489
-
490
- #### `scripts/generate_mock_data.py`
491
- ```python
492
- """Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
493
- ```
494
-
495
- #### `scripts/precompute_results.py`
496
- ```python
497
- """Step 11 — runs the full pipeline and writes data/precomputed/*.json."""
498
- ```
499
-
500
- #### `scripts/smoke_test.py`
501
- ```python
502
- """Step 13 — programmatic end-to-end check; exits 0 on success."""
503
- ```
504
-
505
- ---
506
-
507
- ### `assets/` directory (empty, for later)
508
-
509
- ```
510
- assets/
511
- screenshots/
512
- ```
513
-
514
- ---
515
-
516
- ### `deck/` directory (empty, for later)
517
-
518
- ```
519
- deck/
520
- ```
521
-
522
- ---
523
-
524
- ## Directory Tree After This Step
525
-
526
- ```
527
- TenderIQ/
528
- ├── app.py
529
- ├── requirements.txt
530
- ├── packages.txt
531
- ├── .env.example
532
- ├── .gitignore
533
- ├── specs/
534
- │ └── 00_skeleton.md ← this file
535
- ├── core/
536
- │ ├── __init__.py
537
- │ ├── config.py
538
- │ ├── schemas.py
539
- │ ├── prompts.py
540
- │ ├── llm_client.py
541
- │ ├── pdf_utils.py
542
- │ ├── ocr_pipeline.py
543
- │ ├── chunker.py
544
- │ ├── vectorstore.py
545
- │ ├── criteria_extractor.py
546
- │ ├── bidder_processor.py
547
- │ ├── evaluator.py
548
- │ ├── audit.py
549
- │ └── fallback.py
550
- ├── ui/
551
- │ ├── __init__.py
552
- │ ├── tab_overview.py
553
- │ ├── tab_tender.py
554
- │ ├── tab_bidders.py
555
- │ ├── tab_review.py
556
- │ ├── tab_audit.py
557
- │ └── components.py
558
- ├── data/
559
- │ ├── tender/
560
- │ ├── bidders/
561
- │ │ ├── bidder_a/
562
- │ │ ├── bidder_b/
563
- │ │ └── bidder_c/
564
- │ └── precomputed/
565
- ├── scripts/
566
- │ ├── generate_mock_data.py
567
- │ ├── precompute_results.py
568
- │ └── smoke_test.py
569
- ├── assets/
570
- │ └── screenshots/
571
- └── deck/
572
- ```
573
-
574
- Runtime artifacts (gitignored, not created here): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`.
575
-
576
- ---
577
-
578
- ## Acceptance Criteria
579
-
580
- 1. `python -c "import app"` executes without `ImportError` (all stubs importable).
581
- 2. `streamlit run app.py` opens in the browser without a Python traceback.
582
- 3. Five tabs are visible: Overview, Tender Analysis, Bidder Evaluation, Human Review, Audit Log.
583
- 4. Sidebar shows "⚖️ TenderIQ", a caption, a red connection dot placeholder, and a "Reset Session" button.
584
- 5. Each tab body shows an `st.info(...)` placeholder — no blank white screens.
585
- 6. `python -c "from core import config, schemas, prompts"` runs without error.
586
-
587
- ---
588
-
589
- ## What This Step Does NOT Do
590
-
591
- - No logic implemented in any `core/` module.
592
- - No Streamlit secrets or `.env` required to pass the checkpoint.
593
- - No data files generated (Step 2 does that).
594
- - No pip install triggered (assumed the environment is set up separately).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/01_config_and_schemas.md DELETED
@@ -1,145 +0,0 @@
1
- # Spec 01 — Config, Schemas, and Prompts
2
-
3
- **Step:** 3 of 15
4
- **Time budget:** ~25 min
5
- **Checkpoint:** `python -c "from core import config, schemas, prompts"` runs without error. All Pydantic models validate sample JSON correctly.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Finalize `core/config.py`, `core/schemas.py`, and `core/prompts.py` with full working implementations (the skeleton stubs already have the correct content — this step validates and documents them).
12
-
13
- ---
14
-
15
- ## `core/config.py`
16
-
17
- Loads environment variables. All values are module-level constants.
18
-
19
- | Constant | Type | Value / Source |
20
- |---|---|---|
21
- | `DEEPSEEK_API_KEY` | `str | None` | `os.getenv("DEEPSEEK_API_KEY")` |
22
- | `DEEPSEEK_BASE_URL` | `str` | `"https://api.deepseek.com/v1"` |
23
- | `MODEL_NAME` | `str` | `"deepseek-chat"` |
24
- | `MODEL_VERSION` | `str` | `f"{MODEL_NAME}@2026-05-07"` |
25
- | `CONFIDENCE_HIGH` | `float` | `0.80` |
26
- | `CONFIDENCE_REVIEW` | `float` | `0.55` |
27
- | `OCR_TESSERACT_MIN_CONF` | `float` | `0.65` |
28
- | `BASE_DIR` | `Path` | parent of `core/` |
29
- | `DATA_DIR` | `Path` | `BASE_DIR / "data"` |
30
- | `CHROMA_DIR` | `str` | `str(BASE_DIR / ".chroma")` |
31
- | `AUDIT_DB` | `str` | `str(BASE_DIR / "audit.db")` |
32
- | `PRECOMPUTED_DIR` | `Path` | `DATA_DIR / "precomputed"` |
33
- | `OCR_CACHE_DIR` | `Path` | `BASE_DIR / ".ocr_cache"` |
34
-
35
- `load_dotenv()` is called at module level so `.env` is sourced before `os.getenv`.
36
-
37
- ---
38
-
39
- ## `core/schemas.py`
40
-
41
- Pydantic v2 models. All fields have type annotations. Use `from __future__ import annotations`.
42
-
43
- ### `Rule`
44
- ```python
45
- class Rule(BaseModel):
46
- type: Literal["numeric_threshold", "count_threshold", "certification_present", "document_present"]
47
- field: str
48
- operator: Literal[">=", "<=", "==", "exists"]
49
- value: float | int | None = None
50
- unit: str | None = None
51
- ```
52
-
53
- ### `Criterion`
54
- ```python
55
- class Criterion(BaseModel):
56
- id: str
57
- title: str
58
- category: Literal["financial", "technical", "compliance"]
59
- mandatory: bool
60
- description: str
61
- rule: Rule
62
- query_hints: list[str]
63
- source_page: int
64
- source_clause: str
65
- ```
66
-
67
- ### `Evidence`
68
- ```python
69
- class Evidence(BaseModel):
70
- bidder_id: str
71
- doc_name: str
72
- page: int
73
- text: str
74
- source_type: Literal["text_pdf", "tesseract", "vision_llm"]
75
- ocr_confidence: float | None = None
76
- ```
77
-
78
- ### `Source`
79
- ```python
80
- class Source(BaseModel):
81
- doc_name: str
82
- page: int
83
- snippet: str
84
- source_type: Literal["text_pdf", "tesseract", "vision_llm"]
85
- ```
86
-
87
- ### `Verdict`
88
- ```python
89
- class Verdict(BaseModel):
90
- verdict_id: str = Field(default_factory=lambda: f"V-{uuid.uuid4().hex[:8]}")
91
- bidder_id: str
92
- criterion_id: str
93
- verdict: Literal["eligible", "not_eligible", "needs_review"]
94
- extracted_value: str | None = None
95
- normalized_value: float | int | None = None
96
- source: Source | None = None
97
- llm_confidence: float = 0.0
98
- ocr_confidence: float | None = None
99
- combined_confidence: float = 0.0
100
- reason: str = ""
101
- model_version: str = ""
102
- timestamp: str = ""
103
- review_status: Literal["pending", "approved", "edited", "rejected"] = "pending"
104
- ```
105
-
106
- ### `AuditEntry`
107
- ```python
108
- class AuditEntry(BaseModel):
109
- id: int | None = None
110
- ts: str
111
- action: str
112
- actor: str
113
- model_version: str | None = None
114
- bidder_id: str | None = None
115
- criterion_id: str | None = None
116
- payload_json: str | None = None
117
- ```
118
-
119
- ---
120
-
121
- ## `core/prompts.py`
122
-
123
- Three string constants already defined in the skeleton — no changes needed.
124
-
125
- - `EXTRACT_CRITERIA_PROMPT_SYSTEM`
126
- - `EVALUATE_CRITERION_PROMPT_SYSTEM`
127
- - `VISION_OCR_PROMPT_SYSTEM`
128
- - `VISION_OCR_USER`
129
-
130
- ---
131
-
132
- ## Acceptance Criteria
133
-
134
- 1. `python -c "from core import config, schemas, prompts"` exits 0.
135
- 2. `python -c "from core.schemas import Criterion, Verdict, Evidence, AuditEntry; print('OK')"` prints OK.
136
- 3. Sample Criterion JSON validates without error:
137
- ```python
138
- from core.schemas import Criterion
139
- c = Criterion(**{"id":"C1","title":"Turnover","category":"financial",
140
- "mandatory":True,"description":"INR 5Cr","rule":{"type":"numeric_threshold",
141
- "field":"turnover","operator":">=","value":50000000,"unit":"INR"},
142
- "query_hints":["turnover"],"source_page":3,"source_clause":"3.2(a)"})
143
- assert c.mandatory is True
144
- ```
145
- 4. `config.MODEL_VERSION` contains `"deepseek-chat@2026-05-07"`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/02_llm_client.md DELETED
@@ -1,101 +0,0 @@
1
- # Spec 02 — LLM Client
2
-
3
- **Step:** 4 of 15
4
- **Time budget:** ~25 min
5
- **Checkpoint:** `LLM().chat_json(system, user)` returns a dict when the API key is valid; raises `LLMUnavailable` when the key is missing.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Implement `core/llm_client.py` — a thin wrapper around the OpenAI Python SDK pointed at the DeepSeek API. Provides `chat_json` (JSON-mode responses) and `chat_vision` (multimodal image input). Both methods retry on transient failures and raise `LLMUnavailable` after `max_retries`.
12
-
13
- ---
14
-
15
- ## Dependencies
16
-
17
- - `openai` Python SDK (OpenAI-compatible, pointed at DeepSeek base URL)
18
- - `core.config` for `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `MODEL_NAME`, `MODEL_VERSION`
19
- - `core.prompts` for prompt constants (used by callers, not by this module directly)
20
-
21
- ---
22
-
23
- ## Class: `LLMUnavailable`
24
-
25
- ```python
26
- class LLMUnavailable(Exception):
27
- pass
28
- ```
29
-
30
- Raised whenever the LLM call cannot be completed after all retries. Callers should catch this and route to `fallback.py`.
31
-
32
- ---
33
-
34
- ## Class: `LLM`
35
-
36
- ### `__init__(self, api_key: str | None = None)`
37
-
38
- - If `api_key` is `None`, use `config.DEEPSEEK_API_KEY`.
39
- - If the resolved key is `None` or empty: do NOT raise immediately — defer to call time so the app can start without a key (precomputed mode).
40
- - Create an `openai.OpenAI(api_key=key, base_url=DEEPSEEK_BASE_URL)` client and store as `self._client`.
41
-
42
- ### `chat_json(self, system: str, user: str, max_retries: int = 2) -> dict`
43
-
44
- Calls the chat completions API with `response_format={"type": "json_object"}`, `temperature=0`.
45
-
46
- Messages: `[{"role": "system", "content": system}, {"role": "user", "content": user}]`
47
-
48
- Retry logic:
49
- 1. Try the API call.
50
- 2. On success: parse `response.choices[0].message.content` as JSON. If `json.loads` fails, retry once with a stricter system postscript `" Respond ONLY with valid JSON, no prose."`. If it fails again, raise `LLMUnavailable("Malformed JSON after retries")`.
51
- 3. On `openai.APIStatusError` (5xx) or `openai.APIConnectionError`: exponential backoff (`2 ** attempt` seconds, max 2 attempts), then raise `LLMUnavailable`.
52
- 4. On `openai.AuthenticationError` (401): raise `LLMUnavailable("Invalid API key")` immediately (no retry).
53
- 5. If `api_key` is None/empty at call time: raise `LLMUnavailable("No API key configured")`.
54
-
55
- Returns `dict`.
56
-
57
- ### `chat_vision(self, system: str, user_text: str, image: bytes | str | Path, max_retries: int = 2) -> str`
58
-
59
- Sends a multimodal message using the OpenAI vision format.
60
-
61
- Image encoding:
62
- - If `image` is `bytes`: base64-encode directly.
63
- - If `image` is `Path` or `str`: read the file as bytes, then base64-encode.
64
- - Build data URI: `f"data:image/png;base64,{b64_str}"`.
65
-
66
- Message format:
67
- ```python
68
- [
69
- {"role": "system", "content": system},
70
- {"role": "user", "content": [
71
- {"type": "text", "text": user_text},
72
- {"type": "image_url", "image_url": {"url": data_uri}},
73
- ]},
74
- ]
75
- ```
76
-
77
- Call at `temperature=0`, no `response_format` (vision endpoint returns plain text).
78
-
79
- Retry logic: same as `chat_json` but on content errors: just retry with same prompt. Returns `response.choices[0].message.content` as string.
80
-
81
- On any failure after retries: raise `LLMUnavailable`.
82
-
83
- ---
84
-
85
- ## Error handling summary
86
-
87
- | Condition | Behaviour |
88
- |---|---|
89
- | Missing/empty API key | `LLMUnavailable("No API key configured")` |
90
- | 401 AuthenticationError | `LLMUnavailable("Invalid API key")` |
91
- | 5xx / ConnectionError | Retry with backoff, then `LLMUnavailable` |
92
- | Malformed JSON (chat_json) | Retry once with stricter prompt, then `LLMUnavailable` |
93
-
94
- ---
95
-
96
- ## Acceptance Criteria
97
-
98
- 1. `from core.llm_client import LLM, LLMUnavailable` imports cleanly.
99
- 2. `LLM(api_key=None)` with no `.env` → calling `chat_json(...)` raises `LLMUnavailable` (not an unhandled exception).
100
- 3. With a valid key: `LLM().chat_json("respond with valid json", '{"ok": true}')` returns `{"ok": True}` (or similar).
101
- 4. `LLMUnavailable` is a subclass of `Exception`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/03_pdf_utils_and_chunker.md DELETED
@@ -1,80 +0,0 @@
1
- # Spec 03 — PDF Utils and Chunker
2
-
3
- **Step:** 5 of 15
4
- **Time budget:** ~15 min
5
-
6
- ---
7
-
8
- ## Goal
9
-
10
- Implement `core/pdf_utils.py` (PyMuPDF text extraction and page rendering) and `core/chunker.py` (text → chunks with metadata).
11
-
12
- ---
13
-
14
- ## `core/pdf_utils.py`
15
-
16
- ### `extract_pages(path: Path) -> list[dict]`
17
-
18
- - Opens the PDF with `fitz.open(str(path))`.
19
- - For each page `i`: extracts text via `page.get_text("text")`.
20
- - Returns `[{"page": i+1, "text": text}, ...]` (1-indexed pages).
21
-
22
- ### `is_text_pdf(path: Path) -> bool`
23
-
24
- - Opens the PDF.
25
- - Computes average characters per page across all pages.
26
- - Returns `True` if average ≥ 50 characters per page (heuristic for typed PDF vs scanned blank pages).
27
-
28
- ### `render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image.Image`
29
-
30
- - Opens the PDF.
31
- - Gets page at index `page_no - 1` (0-indexed).
32
- - Creates `fitz.Matrix(dpi/72, dpi/72)` and renders via `page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)`.
33
- - Converts pixmap to PIL Image via `Image.frombytes("RGB", [pix.width, pix.height], pix.samples)`.
34
- - Returns the PIL Image.
35
-
36
- ---
37
-
38
- ## `core/chunker.py`
39
-
40
- ### `chunk_tender(pages: list[dict], tender_id: str) -> list[dict]`
41
-
42
- Input: list of `{"page": int, "text": str}` dicts.
43
-
44
- Strategy:
45
- - Join page text. Split on clause headings detected by regex `r'^\d+(\.\d+)*\s+'` (multiline).
46
- - Each chunk: up to ~500 tokens (~2000 chars). If a section is longer, split on `\n\n` boundaries.
47
- - Each chunk dict: `{"text": str, "tender_id": str, "page": int, "chunk_id": str}`.
48
- - `chunk_id` = `f"{tender_id}_p{page}_c{i}"`.
49
-
50
- Simpler implementation (sufficient for 5-page mock tender):
51
- - One chunk per page section: for each page, if text > 2000 chars split into ~2000-char pieces; else one chunk.
52
-
53
- ### `chunk_bidder(pages: list[ExtractedPage], bidder_id: str, doc_name: str) -> list[dict]`
54
-
55
- Input: list of `ExtractedPage` objects.
56
-
57
- Strategy: one chunk per page.
58
-
59
- Each chunk dict:
60
- ```python
61
- {
62
- "text": page.text,
63
- "bidder_id": bidder_id,
64
- "doc_name": doc_name,
65
- "page": page.page,
66
- "source_type": page.source_type,
67
- "ocr_confidence": page.confidence,
68
- "chunk_id": f"{bidder_id}_{doc_name}_p{page.page}",
69
- }
70
- ```
71
-
72
- ---
73
-
74
- ## Acceptance Criteria
75
-
76
- 1. `extract_pages(Path("data/tender/crpf_construction_tender.pdf"))` returns a list of dicts with non-empty text on most pages.
77
- 2. `is_text_pdf(Path("data/tender/crpf_construction_tender.pdf"))` returns `True`.
78
- 3. `render_page_to_image(Path("data/tender/crpf_construction_tender.pdf"), 1)` returns a PIL Image with width > 0.
79
- 4. `chunk_tender(pages, "tender_001")` returns a non-empty list of dicts each having a "text" key.
80
- 5. Each bidder chunk has all required metadata keys.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/04_ocr_pipeline.md DELETED
@@ -1,97 +0,0 @@
1
- # Spec 04 — OCR Pipeline
2
-
3
- **Step:** 7 of 15
4
- **Time budget:** ~30 min
5
- **Checkpoint:** `extract_document(Path("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a list with `source_type` reflecting the OCR tier used.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Implement `core/ocr_pipeline.py` — the three-tier OCR orchestrator. For each document/image, determines the best extraction method: PyMuPDF text (Tier 1), Tesseract (Tier 2), or DeepSeek Vision LLM (Tier 3). Caches results per file to avoid re-OCR on re-runs.
12
-
13
- ---
14
-
15
- ## `ExtractedPage` dataclass
16
-
17
- ```python
18
- @dataclasses.dataclass
19
- class ExtractedPage:
20
- page: int
21
- text: str
22
- source_type: str # "text_pdf" | "tesseract" | "vision_llm"
23
- confidence: float
24
- raw_tier_results: dict
25
- ```
26
-
27
- ---
28
-
29
- ## `extract_document(file_path: Path) -> list[ExtractedPage]`
30
-
31
- ### Cache check
32
-
33
- - Compute `file_hash = hashlib.md5(file_path.read_bytes()).hexdigest()`.
34
- - Cache path: `OCR_CACHE_DIR / f"{file_hash}.json"`.
35
- - If cache exists: deserialize and return `list[ExtractedPage]`.
36
-
37
- ### Routing
38
-
39
- **Case A — Image file (PNG/JPG/JPEG/BMP/TIFF):**
40
- - Treat as single page (page=1).
41
- - Go directly to Tier 2 (Tesseract).
42
- - If Tier 2 confidence < `OCR_TESSERACT_MIN_CONF`: try Tier 3.
43
-
44
- **Case B — PDF file:**
45
- - Call `pdf_utils.is_text_pdf(file_path)`.
46
- - If `True`: Tier 1 — call `pdf_utils.extract_pages(file_path)`, set `source_type="text_pdf"`, `confidence=1.0`.
47
- - If `False`: for each page, render to image via `pdf_utils.render_page_to_image`, then Tier 2.
48
-
49
- ### Tier 2 — Tesseract
50
-
51
- ```python
52
- import pytesseract
53
- data = pytesseract.image_to_data(pil_image, output_type=pytesseract.Output.DATAFRAME)
54
- # Filter rows with conf != -1
55
- valid = data[data["conf"] != -1]
56
- mean_conf = float(valid["conf"].mean()) / 100 if len(valid) > 0 else 0.0
57
- text = " ".join(str(w) for w in valid["text"] if str(w).strip())
58
- ```
59
-
60
- If `mean_conf < OCR_TESSERACT_MIN_CONF` OR `len(text.strip()) < 20`: attempt Tier 3.
61
-
62
- ### Tier 3 — DeepSeek Vision LLM
63
-
64
- - Convert PIL Image to PNG bytes via `io.BytesIO`.
65
- - Call `LLM().chat_vision(VISION_OCR_PROMPT_SYSTEM, VISION_OCR_USER, image_bytes)`.
66
- - On success: `source_type="vision_llm"`, `confidence=0.95`.
67
- - Log `vision_ocr_invoked` audit entry.
68
- - On `LLMUnavailable`: keep Tier 2 result with its `confidence` (will trigger `needs_review` downstream).
69
-
70
- ### Cache write
71
-
72
- After processing all pages, serialize to JSON and save to cache file.
73
-
74
- ---
75
-
76
- ## Serialization format for cache
77
-
78
- ```json
79
- [
80
- {
81
- "page": 1,
82
- "text": "...",
83
- "source_type": "text_pdf",
84
- "confidence": 1.0,
85
- "raw_tier_results": {"tesseract_conf": null, "vision_used": false}
86
- }
87
- ]
88
- ```
89
-
90
- ---
91
-
92
- ## Acceptance Criteria
93
-
94
- 1. `extract_document(Path("data/bidders/bidder_a/audited_financials.pdf"))` returns pages with `source_type="text_pdf"`.
95
- 2. `extract_document(Path("data/bidders/bidder_c/turnover_certificate_scan.png"))` — if Tesseract is available and confidence < 0.65, attempts vision LLM (or returns tesseract result with low confidence when LLM unavailable).
96
- 3. Second call to `extract_document` on same file returns cached result (no re-processing).
97
- 4. Each returned `ExtractedPage` has non-empty `text`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/06_vectorstore_and_bidder_processor.md DELETED
@@ -1,97 +0,0 @@
1
- # Spec 06 — Vector Store and Bidder Processor
2
-
3
- **Step:** 8 of 15
4
- **Time budget:** ~25 min
5
- **Checkpoint:** `process_bidder("bidder_a", ...)` indexes all docs; `gather_evidence("bidder_a", turnover_criterion)` returns chunks mentioning the turnover figure.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Implement `core/vectorstore.py` (ChromaDB persistent client helpers) and `core/bidder_processor.py` (document ingestion + evidence retrieval per criterion).
12
-
13
- ---
14
-
15
- ## `core/vectorstore.py`
16
-
17
- Uses ChromaDB persistent client with `sentence-transformers/all-MiniLM-L6-v2` embeddings.
18
-
19
- ### `get_client()`
20
-
21
- ```python
22
- @st.cache_resource
23
- def get_client():
24
- import chromadb
25
- from core.config import CHROMA_DIR
26
- return chromadb.PersistentClient(path=CHROMA_DIR)
27
- ```
28
-
29
- ### `get_collection(name: str)`
30
-
31
- ```python
32
- def get_collection(name: str):
33
- client = get_client()
34
- return client.get_or_create_collection(
35
- name=name,
36
- metadata={"hnsw:space": "cosine"},
37
- )
38
- ```
39
-
40
- Note: ChromaDB default embedding function uses `all-MiniLM-L6-v2` (~80 MB, downloaded on first run).
41
-
42
- ### `add_chunks(collection, chunks: list[dict], metadatas: list[dict]) -> None`
43
-
44
- - IDs: `hashlib.sha256(chunk["text"].encode()).hexdigest()[:16]` — deduplicates across reruns.
45
- - Calls `collection.upsert(documents=[c["text"] for c in chunks], ids=ids, metadatas=metadatas)`.
46
-
47
- ### `query(collection, text: str, k: int = 4, where: dict | None = None) -> list[dict]`
48
-
49
- - Calls `collection.query(query_texts=[text], n_results=k, where=where)` (omit `where` if None).
50
- - Returns `[{"text": doc, "metadata": meta, "distance": dist}, ...]` from the first result set.
51
- - Handle the case where fewer than `k` documents are in the collection (ChromaDB raises if `n_results > len(collection)`).
52
-
53
- ---
54
-
55
- ## `core/bidder_processor.py`
56
-
57
- ### `process_bidder(bidder_id: str, files: list[Path]) -> None`
58
-
59
- For each file in `files`:
60
- 1. `pages = ocr_pipeline.extract_document(file)`.
61
- 2. `chunks = chunker.chunk_bidder(pages, bidder_id, file.name)`.
62
- 3. Build metadatas list — one per chunk:
63
- ```python
64
- {"bidder_id": bidder_id, "doc_name": file.name,
65
- "page": chunk["page"], "source_type": chunk["source_type"],
66
- "ocr_confidence": chunk["ocr_confidence"]}
67
- ```
68
- 4. `collection = vectorstore.get_collection("bidder_chunks")`.
69
- 5. `vectorstore.add_chunks(collection, chunks, metadatas)`.
70
- 6. `audit.log("bidder_processed", bidder_id=bidder_id, doc_name=file.name, chunk_count=len(chunks))`.
71
-
72
- ### `gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]`
73
-
74
- 1. Build query string: `f"{criterion.title} {' '.join(criterion.query_hints)}"`.
75
- 2. `collection = vectorstore.get_collection("bidder_chunks")`.
76
- 3. `results = vectorstore.query(collection, query, k=k, where={"bidder_id": bidder_id})`.
77
- 4. Map each result to `Evidence`:
78
- ```python
79
- Evidence(
80
- bidder_id=bidder_id,
81
- doc_name=meta["doc_name"],
82
- page=meta["page"],
83
- text=result["text"],
84
- source_type=meta["source_type"],
85
- ocr_confidence=meta.get("ocr_confidence"),
86
- )
87
- ```
88
- 5. Return list.
89
-
90
- ---
91
-
92
- ## Acceptance Criteria
93
-
94
- 1. `process_bidder("bidder_a", [path1, path2, ...])` completes without error and logs audit entries.
95
- 2. `gather_evidence("bidder_a", c1_criterion)` returns at least 1 `Evidence` object.
96
- 3. The strongest evidence for Bidder A's turnover mentions "6,20,00,000" or "INR".
97
- 4. Calling `process_bidder` twice on the same files does not duplicate chunks (upsert).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/07_criteria_extractor.md DELETED
@@ -1,79 +0,0 @@
1
- # Spec 07 — Criteria Extractor
2
-
3
- **Step:** 6 of 15
4
- **Time budget:** ~30 min
5
- **Checkpoint:** Tab 2 in the running app shows 5 criteria extracted from the mock tender.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Implement `core/criteria_extractor.py` and wire up `ui/tab_tender.py` to call it. On `LLMUnavailable`, fall back to `fallback.load_criteria()`. Cache result in `st.session_state["criteria"]`.
12
-
13
- ---
14
-
15
- ## `core/criteria_extractor.py`
16
-
17
- ### `extract_criteria(tender_pdf_path: Path) -> list[Criterion]`
18
-
19
- 1. Call `pdf_utils.extract_pages(tender_pdf_path)` → list of `{"page": int, "text": str}`.
20
- 2. Join pages: `tender_text = "\n\n--- PAGE {n} ---\n\n".join(p["text"] for p in pages)`.
21
- 3. Build user prompt:
22
- ```
23
- {tender_text}
24
-
25
- ---
26
- Return JSON in this exact format:
27
- {"criteria": [
28
- {"id": "C1", "title": "...", "category": "financial|technical|compliance",
29
- "mandatory": true|false, "description": "...",
30
- "rule": {"type": "numeric_threshold|count_threshold|certification_present|document_present",
31
- "field": "...", "operator": ">=|<=|==|exists", "value": null_or_number, "unit": null_or_string},
32
- "query_hints": ["...", "..."],
33
- "source_page": <int>, "source_clause": "..."},
34
- ...
35
- ]}
36
- ```
37
- 4. Call `llm.chat_json(EXTRACT_CRITERIA_PROMPT_SYSTEM, user_prompt)`.
38
- 5. Parse `result["criteria"]` → validate each item as `Criterion(**item)`.
39
- 6. Log `criteria_extracted` to audit with `payload_json=json.dumps({"count": len(criteria)})`.
40
- 7. Return `list[Criterion]`.
41
-
42
- On `LLMUnavailable`:
43
- - Log `precomputed_fallback_used` to audit.
44
- - Set `st.session_state["fallback_active"] = True`.
45
- - Return `fallback.load_criteria()`.
46
-
47
- LLM singleton: use `@st.cache_resource` on a getter `_get_llm()` so the client is created once per Streamlit session.
48
-
49
- ---
50
-
51
- ## `ui/tab_tender.py`
52
-
53
- Renders the Tender Analysis tab. Replaces the stub.
54
-
55
- Layout:
56
- 1. `st.header("Tender Analysis")`
57
- 2. File uploader: `uploaded = st.file_uploader("Upload tender PDF", type=["pdf"])`. If nothing uploaded, use the preloaded mock: `data/tender/crpf_construction_tender.pdf`.
58
- 3. Show the filename being used.
59
- 4. Button **"Extract Criteria (Live LLM)"**:
60
- - Save uploaded bytes to a temp file (or use the mock path directly).
61
- - Call `criteria_extractor.extract_criteria(path)`.
62
- - Store in `st.session_state["criteria"]`.
63
- 5. If `st.session_state.get("criteria")`:
64
- - Show `st.success(f"Extracted {len(criteria)} criteria")`.
65
- - For each criterion, render a card using `st.expander`:
66
- - Title + mandatory/optional badge (🔴 Mandatory / 🟡 Optional).
67
- - Category badge (color-coded: financial=blue, technical=green, compliance=orange).
68
- - Description text.
69
- - Source: page + clause.
70
- - Rule details (type, operator, value, unit).
71
-
72
- ---
73
-
74
- ## Acceptance Criteria
75
-
76
- 1. `extract_criteria(Path("data/tender/crpf_construction_tender.pdf"))` returns a list of 5 `Criterion` objects (when LLM is available) or the precomputed fallback (when not).
77
- 2. Tab 2 renders without error in both modes.
78
- 3. Each extracted criterion shows title, mandatory status, category, and source clause.
79
- 4. `st.session_state["criteria"]` is populated after the button is clicked.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/09_evaluator.md DELETED
@@ -1,134 +0,0 @@
1
- # Spec 09 — Evaluator
2
-
3
- **Step:** 9 of 15
4
- **Time budget:** ~25 min
5
- **Checkpoint:** `evaluate("bidder_a", c1)` returns eligible with high confidence; `evaluate("bidder_b", c1)` returns not_eligible.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- Implement `core/evaluator.py` — per-criterion verdict generation with combined confidence scoring and threshold-based safety rules.
12
-
13
- ---
14
-
15
- ## `evaluate(bidder_id: str, criterion: Criterion) -> Verdict`
16
-
17
- ### Step 1 — Gather evidence
18
-
19
- `evidence = bidder_processor.gather_evidence(bidder_id, criterion)`
20
-
21
- If empty: return immediately:
22
- ```python
23
- Verdict(
24
- bidder_id=bidder_id,
25
- criterion_id=criterion.id,
26
- verdict="needs_review",
27
- reason="No matching evidence found in submitted documents.",
28
- llm_confidence=0.0,
29
- combined_confidence=0.0,
30
- model_version=MODEL_VERSION,
31
- timestamp=now_iso(),
32
- )
33
- ```
34
- Log `criterion_evaluated` with verdict=needs_review.
35
-
36
- ### Step 2 — Build LLM prompt
37
-
38
- User message template:
39
- ```
40
- CRITERION:
41
- {criterion.model_dump_json(indent=2)}
42
-
43
- RETRIEVED EVIDENCE (top-k chunks from bidder {bidder_id}):
44
- {json list of evidence dicts with doc_name, page, ocr_confidence, source_type, text}
45
-
46
- Return JSON:
47
- {
48
- "verdict": "eligible" | "not_eligible" | "needs_review",
49
- "extracted_value": "<short string as found in evidence>",
50
- "normalized_value": <number or null>,
51
- "chosen_source": {"doc_name": "...", "page": <int>, "snippet": "<= 200 chars", "source_type": "..."},
52
- "llm_confidence": <0.0 to 1.0>,
53
- "reason": "<one or two sentences>"
54
- }
55
-
56
- Rules:
57
- - If evidence directly contains a value satisfying the rule, verdict=eligible with high llm_confidence.
58
- - If evidence directly contradicts the rule, verdict=not_eligible.
59
- - If no relevant evidence retrieved, verdict=needs_review, llm_confidence<=0.4.
60
- - If the source is OCR with low confidence and the value is borderline, lean to needs_review.
61
- ```
62
-
63
- ### Step 3 — Call LLM
64
-
65
- `result = llm.chat_json(EVALUATE_CRITERION_PROMPT_SYSTEM, user_prompt)`
66
-
67
- On `LLMUnavailable`: return `fallback.load_evaluation(bidder_id, criterion.id)`.
68
-
69
- ### Step 4 — Parse result
70
-
71
- Extract: `verdict`, `extracted_value`, `normalized_value`, `chosen_source`, `llm_confidence`, `reason`.
72
-
73
- Build `Source` object from `chosen_source`.
74
-
75
- ### Step 5 — Combined confidence
76
-
77
- Find the evidence chunk matching `chosen_source` to get `ocr_confidence` and `source_type`:
78
-
79
- ```python
80
- if source_type == "text_pdf":
81
- combined = llm_confidence
82
- elif source_type == "vision_llm":
83
- combined = 0.7 * llm_confidence + 0.3 * 0.95
84
- elif source_type == "tesseract":
85
- tc = ocr_confidence if ocr_confidence and ocr_confidence >= 0 else 0.3
86
- combined = 0.6 * llm_confidence + 0.4 * tc
87
- else:
88
- combined = llm_confidence
89
- ```
90
-
91
- ### Step 6 — Apply threshold safety rules (in order)
92
-
93
- 1. If LLM verdict is `needs_review` → keep.
94
- 2. If `combined >= CONFIDENCE_HIGH` → keep LLM verdict.
95
- 3. If `CONFIDENCE_REVIEW <= combined < CONFIDENCE_HIGH` AND verdict is `not_eligible` → downgrade to `needs_review` (NEVER silently disqualify at medium confidence).
96
- 4. If `combined < CONFIDENCE_REVIEW` → force `needs_review`.
97
-
98
- ### Step 7 — Build and return Verdict
99
-
100
- ```python
101
- Verdict(
102
- bidder_id=bidder_id,
103
- criterion_id=criterion.id,
104
- verdict=final_verdict,
105
- extracted_value=extracted_value,
106
- normalized_value=normalized_value,
107
- source=source,
108
- llm_confidence=llm_confidence,
109
- ocr_confidence=ocr_confidence_from_best_evidence,
110
- combined_confidence=combined,
111
- reason=reason,
112
- model_version=MODEL_VERSION,
113
- timestamp=now_iso(),
114
- review_status="pending",
115
- )
116
- ```
117
-
118
- Log `criterion_evaluated` to audit.
119
-
120
- ---
121
-
122
- ## `evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]`
123
-
124
- Calls `evaluate(bidder_id, c)` for each criterion in sequence. Returns list.
125
-
126
- ---
127
-
128
- ## Acceptance Criteria
129
-
130
- 1. `evaluate("bidder_a", c1)` → `verdict="eligible"`, `combined_confidence >= 0.8` (or fallback eligible).
131
- 2. `evaluate("bidder_b", c1)` → `verdict="not_eligible"` or `"needs_review"` (never silently eligible when turnover is below threshold).
132
- 3. `evaluate_bidder("bidder_a", criteria)` returns 5 verdicts.
133
- 4. All verdicts are `Verdict` instances with valid `review_status="pending"`.
134
- 5. Audit log gains `criterion_evaluated` entries.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/10_audit_and_fallback.md DELETED
@@ -1,83 +0,0 @@
1
- # Spec 10 — Audit and Fallback
2
-
3
- **Step:** 10 of 15
4
- **Time budget:** ~20 min
5
-
6
- ---
7
-
8
- ## Goal
9
-
10
- Document and finalize `core/audit.py` and `core/fallback.py`. Both were implemented early (Step 6) to unblock the criteria extractor. This spec records their contracts.
11
-
12
- ---
13
-
14
- ## `core/audit.py`
15
-
16
- ### SQLite schema
17
-
18
- ```sql
19
- CREATE TABLE IF NOT EXISTS audit_log (
20
- id INTEGER PRIMARY KEY AUTOINCREMENT,
21
- ts TEXT NOT NULL,
22
- action TEXT NOT NULL,
23
- actor TEXT NOT NULL,
24
- model_version TEXT,
25
- bidder_id TEXT,
26
- criterion_id TEXT,
27
- payload_json TEXT
28
- );
29
- ```
30
-
31
- Single file: `AUDIT_DB = str(BASE_DIR / "audit.db")`.
32
-
33
- ### `log(action: str, actor: str = "system", **fields) -> int`
34
-
35
- - Writes one row. Returns the inserted `rowid`.
36
- - `ts`: UTC ISO timestamp.
37
- - `model_version`: from `fields` if present, else `config.MODEL_VERSION`.
38
- - `bidder_id`, `criterion_id`: extracted from `fields` if present.
39
- - Remaining `fields` → `payload_json = json.dumps(fields)`.
40
-
41
- ### `query(filters: dict | None = None) -> list[dict]`
42
-
43
- - Returns rows from `audit_log` ordered by `id DESC`.
44
- - Supports filters: `bidder_id`, `action`, `date_from` (ts >=), `date_to` (ts <=).
45
-
46
- ### Action vocabulary
47
-
48
- | Action | When logged |
49
- |---|---|
50
- | `criteria_extracted` | After successful LLM criteria extraction |
51
- | `bidder_processed` | After each document is indexed |
52
- | `criterion_evaluated` | After each (bidder, criterion) verdict |
53
- | `human_review_action` | When evaluator approves/edits/rejects a verdict |
54
- | `precomputed_fallback_used` | When LLM is unavailable and fallback fires |
55
- | `vision_ocr_invoked` | When Tier 3 vision LLM is called |
56
-
57
- ---
58
-
59
- ## `core/fallback.py`
60
-
61
- ### `load_criteria() -> list[Criterion]`
62
-
63
- - Reads `data/precomputed/criteria.json` if it exists, parses `{"criteria": [...]}`.
64
- - Falls back to `_HARDCODED_CRITERIA` (5 hardcoded criteria matching the mock tender exactly) if file is missing.
65
-
66
- ### `load_evaluation(bidder_id: str, criterion_id: str) -> Verdict`
67
-
68
- - Reads `data/precomputed/eval_{bidder_id}.json` if it exists.
69
- - Finds the dict where `criterion_id` matches.
70
- - Falls back to a `needs_review` Verdict with reason "Pre-computed evaluation not available."
71
-
72
- ### `_HARDCODED_CRITERIA`
73
-
74
- Five criteria matching the mock tender (C1–C5), with correct rules and query_hints. These are the ultimate safety net if `precompute_results.py` has not been run.
75
-
76
- ---
77
-
78
- ## Acceptance Criteria
79
-
80
- 1. `audit.log("test")` inserts a row; `audit.query()` returns it.
81
- 2. `audit.query({"action": "criteria_extracted"})` filters correctly.
82
- 3. `fallback.load_criteria()` returns 5 criteria even with no precomputed file.
83
- 4. `fallback.load_evaluation("bidder_a", "C1")` returns a `Verdict` with `verdict_id` set.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/11_mock_data.md DELETED
@@ -1,211 +0,0 @@
1
- # Spec 11 — Mock Data Generation
2
-
3
- **Step:** 2 of 15
4
- **Time budget:** ~25 min
5
- **Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence (~50–65%).
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- `scripts/generate_mock_data.py` is a single deterministic script that produces:
12
- 1. One tender PDF (`data/tender/crpf_construction_tender.pdf`)
13
- 2. Five PDFs for Bidder A (clearly eligible)
14
- 3. Five PDFs for Bidder B (clearly ineligible — turnover too low)
15
- 4. Four PDFs + one noisy scan PNG for Bidder C (needs review)
16
-
17
- All files are entirely synthetic and self-contained — no external assets required. The script must run in under 30 seconds.
18
-
19
- ---
20
-
21
- ## Dependencies
22
-
23
- - `reportlab` — PDF generation
24
- - `Pillow` — image manipulation
25
- - `numpy` — salt-and-pepper noise
26
-
27
- ---
28
-
29
- ## Output Files
30
-
31
- ```
32
- data/
33
- tender/
34
- crpf_construction_tender.pdf
35
- bidders/
36
- bidder_a/
37
- company_profile.pdf
38
- audited_financials.pdf
39
- project_experience.pdf
40
- gst_certificate.pdf
41
- iso_9001.pdf
42
- bidder_b/
43
- company_profile.pdf
44
- audited_financials.pdf
45
- project_experience.pdf
46
- gst_certificate.pdf
47
- iso_9001.pdf
48
- bidder_c/
49
- company_profile.pdf
50
- project_experience.pdf
51
- gst_certificate.pdf
52
- iso_9001.pdf
53
- turnover_certificate_scan.png
54
- ```
55
-
56
- ---
57
-
58
- ## Tender PDF — `crpf_construction_tender.pdf`
59
-
60
- `reportlab` SimpleDocTemplate, 5–6 pages with formal government tender language.
61
-
62
- ### Sections
63
-
64
- 1. **Introduction** — "Central Reserve Police Force, Ministry of Home Affairs, Government of India. Tender for Construction of Residential Quarters."
65
- 2. **Scope of Work** — brief description of construction project.
66
- 3. **Eligibility Criteria** — Section 3.2, contains five criteria (see table below).
67
- 4. **Submission Procedure** — dates, contact details.
68
- 5. **Evaluation Methodology** — how bids will be scored.
69
- 6. **Annexures** — supporting forms.
70
-
71
- ### Five Criteria (exact text in Section 3.2)
72
-
73
- | ID | Clause | Verbatim Text | Mandatory | Category |
74
- |---|---|---|---|---|
75
- | C1 | 3.2(a) | "The bidder shall have a minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years (2022-23, 2023-24, 2024-25), as certified by a Chartered Accountant." | Yes | financial |
76
- | C2 | 3.2(b) | "The bidder must have successfully completed at least three (3) similar construction projects of value not less than INR 1 Crore each in the last five (5) financial years. Completion certificates from clients shall be submitted." | Yes | technical |
77
- | C3 | 3.2(c) | "The bidder shall possess a valid Goods and Services Tax (GST) registration certificate. The GSTIN must be active as on the date of submission." | Yes | compliance |
78
- | C4 | 3.2(d) | "The bidder shall hold a valid ISO 9001:2015 Quality Management System certification issued by an accredited certification body, valid as on the date of bid submission." | Yes | compliance |
79
- | C5 | 3.2(e) | "Preferably, the bidder may have prior experience with construction or maintenance of paramilitary or defence infrastructure. This is a desirable criterion and shall not affect mandatory eligibility." | No | technical |
80
-
81
- C5 uses "preferably" and "desirable" → tests the mandatory-vs-optional classifier.
82
-
83
- ---
84
-
85
- ## Bidder A — Clearly Eligible
86
-
87
- ### `company_profile.pdf`
88
- - Company: "Apex Constructions Pvt. Ltd."
89
- - GSTIN: 27AABCA1234F1Z5
90
- - Registered: 2010
91
- - ISO 9001:2015 certified: Yes
92
-
93
- ### `audited_financials.pdf`
94
- - FY 2022-23: Annual Turnover INR 5,80,00,000 (Rupees Five Crore Eighty Lakh)
95
- - FY 2023-24: Annual Turnover INR 6,20,00,000 (Rupees Six Crore Twenty Lakh)
96
- - FY 2024-25: Annual Turnover INR 7,10,00,000 (Rupees Seven Crore Ten Lakh)
97
- - Average: INR 6,36,66,667 — exceeds INR 5 Crore threshold
98
- - Certified by: CA Ramesh Kumar, M. No. 123456
99
-
100
- ### `project_experience.pdf`
101
- - 5 projects listed (2020–2025), each ≥ INR 1 Crore
102
- - Includes one CRPF project (2023): "Construction of barracks, CRPF Camp, Pune, INR 3.5 Crore"
103
-
104
- ### `gst_certificate.pdf`
105
- - GSTIN: 27AABCA1234F1Z5
106
- - Valid through: 31-03-2027
107
- - Status: Active
108
-
109
- ### `iso_9001.pdf`
110
- - Certificate No: ISO-2021-9001-APEX
111
- - Valid through: 15-06-2027
112
- - Issued by: Bureau Veritas
113
-
114
- ---
115
-
116
- ## Bidder B — Clearly Ineligible (turnover too low)
117
-
118
- Same structure as Bidder A, but financials are below threshold.
119
-
120
- ### `company_profile.pdf`
121
- - Company: "BuildRight Enterprises"
122
- - GSTIN: 29AABCB5678G1Z3
123
-
124
- ### `audited_financials.pdf`
125
- - FY 2022-23: Annual Turnover INR 1,20,00,000 (Rupees One Crore Twenty Lakh)
126
- - FY 2023-24: Annual Turnover INR 1,50,00,000 (Rupees One Crore Fifty Lakh)
127
- - FY 2024-25: Annual Turnover INR 1,80,00,000 (Rupees One Crore Eighty Lakh)
128
- - Average: INR 1,50,00,000 — **below** INR 5 Crore threshold
129
- - Certified by: CA Suresh Patel, M. No. 654321
130
-
131
- ### `project_experience.pdf`
132
- - 4 projects listed (2021–2025), each ≥ INR 1 Crore — passes C2
133
-
134
- ### `gst_certificate.pdf`
135
- - GSTIN: 29AABCB5678G1Z3, valid through 2027, Active
136
-
137
- ### `iso_9001.pdf`
138
- - Certificate No: ISO-2022-9001-BR
139
- - Valid through: 20-08-2027
140
-
141
- ---
142
-
143
- ## Bidder C — Needs Review (scanned turnover certificate)
144
-
145
- No typed `audited_financials.pdf`. Instead: a deliberately noisy scan PNG.
146
-
147
- ### `company_profile.pdf`
148
- - Company: "Shree Constructions & Services"
149
- - GSTIN: 24AABCC9012H1Z1
150
-
151
- ### `project_experience.pdf`
152
- - Exactly 3 projects (borderline meets count threshold for C2)
153
- - Values: INR 1.2 Cr, INR 1.5 Cr, INR 2.1 Cr
154
-
155
- ### `gst_certificate.pdf`
156
- - GSTIN: 24AABCC9012H1Z1, valid through 2027, Active
157
-
158
- ### `iso_9001.pdf`
159
- - Certificate No: ISO-2023-9001-SCS
160
- - Valid through: 10-09-2027
161
-
162
- ### `turnover_certificate_scan.png` — noisy scan generation
163
-
164
- This is the OCR demo centerpiece. Steps:
165
-
166
- 1. Render a `reportlab` page to an in-memory PDF with a CA's turnover certificate:
167
- - "This is to certify that M/s Shree Constructions & Services ... average annual turnover of INR 5,40,00,000 (Rupees Five Crore Forty Lakh only) for the financial years 2022-23, 2023-24, and 2024-25."
168
- - Include year-wise breakdown table.
169
- 2. Convert that PDF page to a PIL Image at 150 DPI using `fitz` (PyMuPDF).
170
- 3. Apply degradation:
171
- - `ImageFilter.GaussianBlur(radius=1.5)`
172
- - Salt-and-pepper noise via numpy: randomly set ~5% of pixels to 0 or 255
173
- - `image.rotate(-2, expand=True, fillcolor=(255,255,255))`
174
- - Re-save with JPEG compression at quality=40 then reload as PNG
175
- 4. Save as `data/bidders/bidder_c/turnover_certificate_scan.png`
176
-
177
- **Expected outcome:** Tesseract reads this at mean confidence ~50–65% → triggers Tier-3 vision LLM. The turnover figure (INR 5,40,00,000) is present but partially degraded, making it a realistic "needs human review" case given combined-confidence rules.
178
-
179
- ---
180
-
181
- ## Script Design
182
-
183
- ```python
184
- # scripts/generate_mock_data.py
185
-
186
- def make_tender_pdf(out_path: Path) -> None: ...
187
- def make_company_profile(out_path: Path, name: str, gstin: str, year: int) -> None: ...
188
- def make_financials(out_path: Path, rows: list[tuple[str, str, int]]) -> None: ...
189
- def make_project_experience(out_path: Path, projects: list[dict]) -> None: ...
190
- def make_gst_certificate(out_path: Path, gstin: str, valid_through: str) -> None: ...
191
- def make_iso_certificate(out_path: Path, cert_no: str, valid_through: str, company: str) -> None: ...
192
- def make_noisy_scan(out_path: Path) -> None: ...
193
-
194
- if __name__ == "__main__":
195
- # Ensure output dirs exist
196
- # Generate all files
197
- print("Mock data generated successfully.")
198
- ```
199
-
200
- Each helper creates one PDF/PNG. The script is idempotent (re-running overwrites files). No command-line arguments needed.
201
-
202
- ---
203
-
204
- ## Acceptance Criteria
205
-
206
- 1. Running `python scripts/generate_mock_data.py` exits 0 and prints "Mock data generated successfully."
207
- 2. All 16 files listed above exist after the run.
208
- 3. Each PDF opens in a viewer without errors and contains the text described.
209
- 4. `turnover_certificate_scan.png` is visibly degraded (blurry, rotated, noisy).
210
- 5. Running `pytesseract.image_to_data(Image.open("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a dataframe where the filtered mean confidence is between 30 and 70 (i.e., low enough to trigger Tier 3).
211
- 6. Script completes in under 30 seconds on any modern machine.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/12_precompute.md DELETED
@@ -1,73 +0,0 @@
1
- # Spec 12 — Pre-compute Results
2
-
3
- **Step:** 11 of 15
4
- **Time budget:** ~15 min
5
- **Checkpoint:** Four JSON files exist in `data/precomputed/` and validate against the schemas.
6
-
7
- ---
8
-
9
- ## Goal
10
-
11
- `scripts/precompute_results.py` runs the full pipeline once (requires a valid API key), saves the results as JSON fallback files, and commits them to the repo. When the API is unavailable during a demo, `fallback.py` reads these files instead.
12
-
13
- ---
14
-
15
- ## Script: `scripts/precompute_results.py`
16
-
17
- ```python
18
- """Step 11 — runs the full pipeline and writes data/precomputed/*.json."""
19
- ```
20
-
21
- ### Steps
22
-
23
- 1. Ensure `data/precomputed/` exists.
24
- 2. Extract criteria from mock tender → save `data/precomputed/criteria.json`:
25
- ```json
26
- {"criteria": [<Criterion.model_dump()>, ...]}
27
- ```
28
- 3. For each bidder (`bidder_a`, `bidder_b`, `bidder_c`):
29
- a. Process all bidder docs (`process_bidder`).
30
- b. Evaluate all criteria (`evaluate_bidder`).
31
- c. Save `data/precomputed/eval_{bidder_id}.json`:
32
- ```json
33
- [<Verdict.model_dump()>, ...]
34
- ```
35
- 4. Print summary and exit 0.
36
-
37
- ### Error handling
38
-
39
- If the LLM fails for any criterion: catch `LLMUnavailable`, log a warning, skip that criterion (don't crash). At least the criteria file and partial evals are better than nothing.
40
-
41
- If no API key: print instructions and exit 1.
42
-
43
- ---
44
-
45
- ## Fallback file format
46
-
47
- ### `criteria.json`
48
- ```json
49
- {
50
- "criteria": [
51
- {"id": "C1", "title": "...", ...},
52
- ...
53
- ]
54
- }
55
- ```
56
-
57
- ### `eval_bidder_a.json`
58
- ```json
59
- [
60
- {"verdict_id": "V-abc123", "bidder_id": "bidder_a", "criterion_id": "C1", "verdict": "eligible", ...},
61
- ...
62
- ]
63
- ```
64
-
65
- ---
66
-
67
- ## Acceptance Criteria
68
-
69
- 1. Running `python scripts/precompute_results.py` exits 0 when API key is set.
70
- 2. `data/precomputed/criteria.json` exists and contains `{"criteria": [...]}` with 5 items.
71
- 3. Each `eval_bidder_*.json` contains a list of 5 `Verdict` dicts.
72
- 4. `from core.fallback import load_criteria` returns 5 `Criterion` objects from the file.
73
- 5. `from core.fallback import load_evaluation` returns the correct `Verdict` for bidder_a, C1.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
specs/13_ui_tabs.md DELETED
@@ -1,121 +0,0 @@
1
- # Spec 13 — UI Tabs
2
-
3
- **Step:** 12 of 15
4
- **Time budget:** ~80 min total
5
-
6
- ---
7
-
8
- ## Goal
9
-
10
- Implement all five Streamlit tabs and `ui/components.py`. The app must render the full demo flow without an API key (using precomputed data), and with one (calling the live LLM).
11
-
12
- ---
13
-
14
- ## `ui/components.py` — Shared widgets
15
-
16
- ### `verdict_pill(verdict: str) -> str`
17
- Returns a markdown-formatted colored badge string:
18
- - `eligible` → `":green[✅ Eligible]"`
19
- - `not_eligible` → `":red[❌ Not Eligible]"`
20
- - `needs_review` → `":orange[⚠ Needs Review]"`
21
-
22
- ### `confidence_bar(value: float, label: str = "Confidence") -> None`
23
- Renders `st.progress(value, text=f"{label}: {value:.0%}")`.
24
-
25
- ### `ocr_tier_badge(source_type: str) -> str`
26
- Returns a short badge string:
27
- - `text_pdf` → "`📄 text_pdf`"
28
- - `tesseract` → "`🔍 tesseract`"
29
- - `vision_llm` → "`👁 vision_llm`"
30
-
31
- ### `category_badge(category: str) -> str`
32
- Returns `":blue[financial]"`, `":green[technical]"`, or `":orange[compliance]"`.
33
-
34
- ---
35
-
36
- ## Tab 1 — Overview (`ui/tab_overview.py`)
37
-
38
- Layout:
39
- 1. Hero text + tagline.
40
- 2. Two-column KPI cards: Criteria Extracted, Bidders Evaluated, Mandatory Criteria Checked, Audit Entries Logged.
41
- 3. Architecture summary (text description since no image file yet).
42
- 4. "Use Pre-loaded Demo Data" CTA that sets `st.session_state["use_demo"] = True` and shows the criteria count from the fallback file.
43
-
44
- KPI values: count from `st.session_state` data and `audit.query()`.
45
-
46
- ---
47
-
48
- ## Tab 2 — Tender Analysis (`ui/tab_tender.py`)
49
-
50
- Already implemented in Step 6. No changes needed beyond what's there.
51
-
52
- ---
53
-
54
- ## Tab 3 — Bidder Evaluation (`ui/tab_bidders.py`)
55
-
56
- Layout:
57
- 1. `st.header("Bidder Evaluation")`
58
- 2. Multi-select for bidders: `["bidder_a", "bidder_b", "bidder_c"]`, default all.
59
- 3. Button **"Run Evaluation"** (type=primary).
60
- 4. On click:
61
- a. Ensure criteria are loaded (from session_state or fallback).
62
- b. For each selected bidder: `process_bidder(...)`, then `evaluate_bidder(...)`.
63
- c. Store verdicts in `st.session_state["verdicts"]` as `{bidder_id: [Verdict.model_dump(), ...]}`.
64
- 5. If verdicts in session:
65
- - For each bidder: show per-bidder summary header.
66
- - Show a table of criteria rows using `st.columns`.
67
- - Each row: criterion title, verdict pill, extracted value, source chip (doc + page), OCR-tier badge, confidence bar.
68
- - Expandable "Reason" and "Source Snippet" per row.
69
-
70
- Per-bidder summary: count eligible/not_eligible/needs_review among mandatory criteria. Overall: Eligible only if all mandatory are eligible; Not Eligible if any are not_eligible; Needs Review otherwise.
71
-
72
- ---
73
-
74
- ## Tab 4 — Human Review Queue (`ui/tab_review.py`)
75
-
76
- Layout:
77
- 1. `st.header("Human Review Queue")`
78
- 2. Shows all verdicts where `review_status == "pending"` AND `verdict == "needs_review"`.
79
- 3. For each such verdict:
80
- - Show: bidder_id, criterion title, extracted value, confidence, reason, source snippet.
81
- - Three buttons: **Approve**, **Edit & Approve**, **Reject**.
82
- - **Approve**: set `review_status = "approved"`, log `human_review_action` to audit.
83
- - **Edit & Approve**: show `st.text_input` for edited value, set `review_status = "edited"`, log audit.
84
- - **Reject**: set `review_status = "rejected"`, log audit.
85
- 4. If no pending items: `st.success("No items pending review.")`.
86
-
87
- State: verdicts stored in `st.session_state["verdicts"]` as nested dicts. Updates write back to the same structure.
88
-
89
- ---
90
-
91
- ## Tab 5 — Audit Log (`ui/tab_audit.py`)
92
-
93
- Layout:
94
- 1. `st.header("Audit Log")`
95
- 2. Filter row: bidder dropdown, action dropdown, date range.
96
- 3. Table: `st.dataframe` with columns: ts, action, actor, bidder_id, criterion_id, payload_json.
97
- 4. **"Export CSV"** button: `st.download_button` with CSV data from filtered rows.
98
-
99
- ---
100
-
101
- ## Sidebar update (`app.py`)
102
-
103
- Replace the hardcoded "🔴 **DeepSeek:** not connected" with a live probe:
104
- - Try `LLM().chat_json("ping", '{"ping": true}')` at startup (cached with session_state).
105
- - Green: live and no fallback fired.
106
- - Amber: fallback has fired this session.
107
- - Red: probe failed.
108
-
109
- If `st.session_state.get("fallback_active")`: show `st.sidebar.warning("⚠ Pre-computed mode active.")`.
110
-
111
- ---
112
-
113
- ## Acceptance Criteria
114
-
115
- 1. Tab 1 renders without error and shows KPI cards.
116
- 2. Tab 3 "Run Evaluation" populates the verdict table for all 3 bidders.
117
- 3. Bidder A shows all mandatory criteria eligible. Bidder B shows C1 not_eligible.
118
- 4. Tab 4 shows at least one pending review item for Bidder C.
119
- 5. Tab 4 Approve button updates `review_status` and adds an audit entry.
120
- 6. Tab 5 shows audit entries and CSV download works.
121
- 7. Sidebar connection dot is green/amber/red based on API availability.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
submission_requirements.md DELETED
@@ -1,29 +0,0 @@
1
- # Prototype Phase — Submission Requirements
2
-
3
- > This is the submission form for Round 2 (Prototype Phase). The idea was already shortlisted in Round 1.
4
-
5
- ---
6
-
7
- ## Required Fields
8
-
9
- | Field | Notes |
10
- |---|---|
11
- | **Title** | Clear, descriptive title |
12
- | **Description** | Project description with formatting and links allowed |
13
- | **Parent Submission** | Link to the shortlisted Round 1 idea submission |
14
- | **Theme** | Theme 3: AI-Based Tender Evaluation and Eligibility Analysis |
15
- | **Snapshots** | Images of the project (JPG/JPEG/PNG, up to 3MB each) |
16
- | **Video URL** | Demo or pitch video link |
17
- | **Presentation** | Pitch deck or slides (.key, .odp, .odt, .pdf, .pps, .ppt, .pptx — max 50MB) |
18
- | **Demo Link** | Link to working demo or prototype |
19
- | **Repository URL** | GitHub, Bitbucket, or similar code repository |
20
- | **Source Code** | Zip or APK upload (max 50MB) |
21
- | **Instructions to Run** | Step-by-step setup and run instructions for reviewers |
22
- | **Custom Attachment** | Any additional file — PDF, images, spreadsheets (max 50MB) |
23
-
24
- ---
25
-
26
- ## Notes
27
-
28
- - The "Parent Submission" field links this prototype to the previously shortlisted idea.
29
- - "Which shortlisted idea are you submitting this prototype for?" — confirms the link to the Round 1 submission.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
theme.md DELETED
@@ -1,89 +0,0 @@
1
- # Theme 3: AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
2
-
3
- ## Context
4
-
5
- Government organisations such as the Central Reserve Police Force (CRPF) issue tenders to procure goods and services. Each tender specifies detailed requirements: technical specifications, financial thresholds, compliance rules, eligibility conditions, document checklists and mandatory certifications. These requirements are typically written in formal, legally careful language and are spread across many pages of the tender document.
6
-
7
- Private companies respond with bids, each submitting their own set of supporting documents — company profiles, financial statements, experience letters, tax registrations, certifications and more. The documents arrive in many formats: structured text PDFs, scanned copies, Word files, tables and even photographs of physical certificates. The same kind of information is presented in many different ways across bidders.
8
-
9
- Evaluating whether each bidder meets the stated eligibility criteria is currently a manual process. It is slow, inconsistent across evaluators, prone to oversight, and hard to audit. For a single tender, a committee may spend days cross-checking hundreds of pages against a list of criteria, and two evaluators may reach different conclusions from the same set of documents. There is a clear opportunity to bring modern AI techniques to this problem — to extract structured information from unstructured tender and bid documents, apply the eligibility rules consistently, and produce explainable evaluation reports that a human officer can trust and sign off on.
10
-
11
- ---
12
-
13
- ## The Problem
14
-
15
- Design a technical platform that, given a tender document and a set of bidder submissions, can do the following:
16
-
17
- ### Understand the Tender
18
- - Extract the eligibility criteria from the tender document — technical specifications, financial thresholds, compliance conditions, and document and certification requirements.
19
- - Distinguish between mandatory and optional criteria.
20
- - Capture each criterion in a form that can be matched against a bidder's submission.
21
-
22
- ### Understand Each Bidder
23
- - Parse every bidder submission, regardless of whether the documents are typed PDFs, scanned copies, Word files or photographs.
24
- - Extract the values and evidence relevant to each criterion from those documents.
25
- - Handle variation in how bidders present the same information.
26
-
27
- ### Evaluate and Explain
28
- - For each bidder, decide whether they are **Eligible**, **Not Eligible**, or **Need Manual Review** against each criterion and overall.
29
- - Produce an explanation for every verdict that references the specific criterion, the specific document and the specific value that drove the decision.
30
- - Surface ambiguous or uncertain cases for human review rather than silently disqualifying them.
31
- - Produce a consolidated evaluation report that a procurement officer can use as the basis for a decision.
32
-
33
- ---
34
-
35
- ## Non-Negotiables
36
-
37
- - Every verdict must be explainable at the criterion level — which criterion was being checked, which document was used, what value was found, and why the bidder passed, failed or needs review.
38
- - The system must **never silently disqualify** a bidder. Ambiguous or uncertain cases must be surfaced for human review with the reason.
39
- - The system must handle scanned documents and photographs, not only digital text.
40
- - The system must be auditable end-to-end and suitable for use in a formal government procurement decision.
41
- - Real tender and bid data will not be released for Round 1. Any Round 2 implementation will run on representative mock or redacted documents inside a sandbox.
42
-
43
- ---
44
-
45
- ## What Success Looks Like
46
-
47
- A working solution should eventually make the following behaviours possible:
48
-
49
- 1. A procurement officer uploads a tender document and a set of bidder submissions. The system extracts the eligibility criteria automatically and lists them for review.
50
- 2. For each bidder, the system produces a criterion-by-criterion evaluation with references back to the source documents.
51
- 3. Clearly eligible and clearly ineligible bidders are marked as such; genuinely ambiguous cases are flagged for manual review with the reason for the ambiguity.
52
- 4. A consolidated report can be exported and signed off, with a complete audit trail of every automated decision.
53
-
54
- ---
55
-
56
- ## Sample Scenario
57
-
58
- A government department issues a tender for construction services with the following eligibility criteria: a minimum annual turnover of ₹5 crore, at least 3 similar projects completed in the last 5 years, a valid GST registration, and an ISO 9001 certification. Ten bidders submit responses, each with their own combination of typed and scanned supporting documents.
59
-
60
- A good solution would extract these four criteria from the tender, parse each bidder's submission, and produce a report:
61
- - 6 bidders clearly eligible with evidence for each criterion
62
- - 3 clearly ineligible with the specific criterion they failed and the document that showed it
63
- - 1 flagged for manual review because the turnover document is a scanned certificate with figures that could not be read with confidence
64
-
65
- ---
66
-
67
- ## What Your Solution Should Cover
68
-
69
- Round 1 of this hackathon is a **written solution submission**. Your solution document should make clear how you would build this platform. At minimum, it should cover:
70
-
71
- 1. Your understanding of the problem and the realities of government procurement, in your own words.
72
- 2. Your approach to extracting eligibility criteria from a tender document, including how you separate technical, financial and compliance conditions, and how you distinguish mandatory from optional criteria.
73
- 3. Your approach to parsing bidder submissions with heterogeneous document types — typed PDFs, scanned documents, tables, photographs — and extracting the values that map to each criterion.
74
- 4. How you match extracted bidder information against the criteria, and how you handle ambiguity, partial information and variation in legal and technical language.
75
- 5. How the system produces explainable, criterion-level verdicts, and how ambiguous cases are surfaced for human review instead of being silently rejected.
76
- 6. How you would guarantee the auditability of every decision, suitable for a formal government procurement context.
77
- 7. A clear architecture overview, the key technology and model choices you would make, and the reasons behind them.
78
- 8. The main risks and trade-offs you see, and how you would handle them.
79
- 9. A rough implementation plan for Round 2, assuming a sandbox with sample tender and bidder documents is provided.
80
-
81
- ---
82
-
83
- ## How We Will Evaluate Proposals
84
-
85
- - Clarity of problem understanding — does the team show they have grasped the realities of government procurement, not just the surface problem?
86
- - Technical soundness of the proposed approach, including document understanding, criterion matching and explainability.
87
- - Depth of thinking on edge cases: scanned documents, photographs, ambiguous language, partial information and format inconsistency.
88
- - Design of the human-in-the-loop path for ambiguous cases, and of the audit trail.
89
- - Quality of the architecture, the justification of technology and model choices, and the identified risks and trade-offs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
understanding.md DELETED
@@ -1,154 +0,0 @@
1
- # TenderIQ — Project Understanding
2
-
3
- ---
4
-
5
- ## Where We Are
6
-
7
- The idea phase (Round 1) is **done and shortlisted**. The `idea.md` was the written submission. We are now in the **Prototype Phase (Round 2)**, which requires a working prototype, demo, code repository, pitch deck, and video.
8
-
9
- ---
10
-
11
- ## The Problem (from CRPF's perspective)
12
-
13
- CRPF issues tenders. Companies bid. Someone has to manually read:
14
- - The tender document (criteria, thresholds, compliance rules)
15
- - Every bidder's stack of supporting documents (PDFs, scans, photos, Word files)
16
-
17
- ...and verify that each bidder meets each criterion. For one tender, this takes a committee days. Two evaluators may reach different conclusions from the same documents. There's no consistent audit trail.
18
-
19
- **The core pain points:**
20
- 1. Manual, slow, expensive
21
- 2. Inconsistent across evaluators
22
- 3. Not auditable / not transparent
23
- 4. Documents arrive in messy formats (scanned, photographed, mixed)
24
-
25
- ---
26
-
27
- ## What TenderIQ Does
28
-
29
- A four-stage AI pipeline:
30
-
31
- ```
32
- Tender Document ──► [Stage 1] Criteria Extraction
33
-
34
-
35
- Bidder Documents ──► [Stage 2] Document Processing (OCR + entity extraction)
36
-
37
-
38
- [Stage 3] Evaluation Engine (rule-based + confidence)
39
-
40
-
41
- [Stage 4] Explainability + Audit Layer
42
-
43
- ┌─────────┴──────────┐
44
- ▼ ▼
45
- Auto-decision Human Review Queue
46
- (Eligible / Not Eligible) (Needs Manual Review)
47
- ```
48
-
49
- ### Stage 1 — Tender Understanding
50
- - LLM + rule-based hybrid extracts criteria from tender doc
51
- - Classifies each as mandatory or optional
52
- - Outputs structured, machine-readable criteria list
53
-
54
- ### Stage 2 — Bidder Document Processing
55
- - Handles: typed PDFs, scanned docs, images, Word files
56
- - OCR for non-digital content
57
- - Layout-aware parsing (tables, forms, certificates)
58
- - Entity extraction: turnover figures, cert names, project counts
59
- - Every extracted value tagged with: source doc, page number, confidence score
60
-
61
- ### Stage 3 — Evaluation Engine
62
- - Criterion-by-criterion comparison per bidder
63
- - Rule-based validation (threshold checks)
64
- - Confidence-aware: low confidence → "Needs Manual Review", not auto-reject
65
- - Three outcomes: Eligible / Not Eligible / Needs Manual Review
66
-
67
- ### Stage 4 — Explainability + Audit
68
- - Every decision has: criterion checked, value found, source doc, confidence, reason
69
- - Full audit log: model version, timestamp, reviewer actions
70
- - Human reviewers can approve / edit / reject flagged cases
71
- - Reviewer decisions feed back into system improvement
72
-
73
- ---
74
-
75
- ## Non-Negotiables (from theme)
76
-
77
- These are hard constraints, not nice-to-haves:
78
-
79
- | Constraint | Implication for build |
80
- |---|---|
81
- | Every verdict must be explainable at criterion level | No black-box scoring; each criterion decision must be traceable |
82
- | Never silently disqualify | Low confidence = human review queue, not auto-reject |
83
- | Must handle scanned docs and photographs | OCR is not optional |
84
- | End-to-end auditable | Every system action must be logged with immutable records |
85
-
86
- ---
87
-
88
- ## What We Need to Deliver (Prototype Phase)
89
-
90
- | Deliverable | What it means |
91
- |---|---|
92
- | Working demo | The pipeline must actually run on mock/sample data |
93
- | Demo link | Hosted or accessible prototype |
94
- | Repo URL | Clean, documented code |
95
- | Source code zip | Packaged for reviewers to run |
96
- | Run instructions | Step-by-step so reviewers can test it |
97
- | Presentation | Pitch deck covering the full solution |
98
- | Video | Demo + pitch walkthrough |
99
- | Snapshots | Screenshots of the UI/output |
100
- | Description | Written summary of the project |
101
-
102
- ---
103
-
104
- ## Proposed Tech Stack (from idea)
105
-
106
- | Component | Technology | Why |
107
- |---|---|---|
108
- | LLM for criteria extraction | LLM (e.g., Claude, GPT-4, or open-source) | Handles legal language, ambiguity |
109
- | OCR | Tesseract or PaddleOCR | Open-source, handles scanned docs and images |
110
- | Document layout understanding | LayoutLM | Understands tables, forms, structured layouts |
111
- | Backend | Python + FastAPI | Fast to build, good ML ecosystem |
112
- | Database | PostgreSQL + vector DB | Structured storage + semantic search |
113
- | Frontend | React | Dashboard for review, reporting |
114
-
115
- ---
116
-
117
- ## Key Design Decisions to Think About
118
-
119
- ### 1. Hybrid extraction (LLM + rules)
120
- - Pure LLM: flexible but unpredictable on numeric thresholds
121
- - Pure rules: precise but brittle on varied language
122
- - Hybrid: LLM for interpretation, rules for validation — best of both
123
-
124
- ### 2. Confidence threshold design
125
- - What confidence score triggers "Needs Manual Review"?
126
- - This is a calibration problem — too low a threshold floods reviewers, too high risks bad auto-decisions
127
-
128
- ### 3. Vector DB role
129
- - Enables semantic search over extracted bidder data
130
- - Useful when a criterion mentions "similar projects" and you need to match against descriptions
131
-
132
- ### 4. Audit log immutability
133
- - Government procurement context requires tamper-evident logs
134
- - Must capture: what AI decided, why, when, which model version, and what the human reviewer did
135
-
136
- ---
137
-
138
- ## Gaps / Things Not Yet Defined
139
-
140
- - **Which LLM?** The idea says "LLMs" but doesn't specify. For a prototype, this matters.
141
- - **Which vector DB?** Pinecone, Weaviate, ChromaDB, pgvector — not chosen yet.
142
- - **Criteria schema** — what does the structured criterion object look like exactly?
143
- - **Confidence score methodology** — how is it calculated and what thresholds are used?
144
- - **UI scope** — how much of the review interface needs to be built for the prototype?
145
- - **Mock data** — we need sample tender docs and bidder submissions to demo against.
146
- - **Evaluation report format** — what does the exported report look like?
147
-
148
- ---
149
-
150
- ## Summary
151
-
152
- The idea is solid and already shortlisted. The core insight is: **don't try to fully automate procurement decisions; build a system that makes human reviewers dramatically faster and more consistent, with a complete audit trail.** The prototype needs to demonstrate this pipeline end-to-end on mock data, with a UI that shows criterion-level explanations.
153
-
154
- Next step: define the implementation plan — what to build, in what order, and what scope is realistic for the prototype.