JaydeepR commited on
Commit
c589fa3
·
0 Parent(s):

Initial commit: source docs, implementation plan, and skeleton spec

Browse files
Files changed (8) hide show
  1. .gitattributes +2 -0
  2. .gitignore +37 -0
  3. IMPLEMENTATION_PLAN.md +700 -0
  4. idea.md +157 -0
  5. specs/00_skeleton.md +594 -0
  6. submission_requirements.md +29 -0
  7. theme.md +89 -0
  8. understanding.md +154 -0
.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ *.pdf filter=lfs diff=lfs merge=lfs -text
2
+ *.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment
2
+ .env
3
+
4
+ # Runtime artifacts
5
+ .chroma/
6
+ audit.db
7
+ .ocr_cache/
8
+
9
+ # Python
10
+ __pycache__/
11
+ *.pyc
12
+ *.pyo
13
+ *.pyd
14
+ .Python
15
+ *.egg-info/
16
+ dist/
17
+ build/
18
+ .eggs/
19
+ *.egg
20
+
21
+ # Virtual environments
22
+ venv/
23
+ .venv/
24
+ env/
25
+
26
+ # OS
27
+ .DS_Store
28
+ Thumbs.db
29
+
30
+ # IDE
31
+ .vscode/
32
+ .idea/
33
+ *.swp
34
+ *.swo
35
+
36
+ # Streamlit
37
+ .streamlit/secrets.toml
IMPLEMENTATION_PLAN.md ADDED
@@ -0,0 +1,700 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TenderIQ — Implementation Plan
2
+
3
+ > **For:** any contributor or fresh AI context picking up this project.
4
+ > **You do not need any prior conversation context to use this document.**
5
+
6
+ ---
7
+
8
+ ## 0. How To Use This Plan
9
+
10
+ This project follows **spec-driven development**:
11
+
12
+ 1. **This document** is the master implementation plan. It defines architecture, modules, schemas, and the build order. It does **not** contain final source code.
13
+ 2. For **each module or coherent unit of work** listed in this plan, the team will produce a **spec document** (a short markdown file) before writing code. Each spec covers: inputs, outputs, function signatures, error cases, dependencies, and acceptance criteria.
14
+ 3. Code is written **only against an approved spec**, not directly from this plan.
15
+ 4. Specs live in `specs/` (e.g. `specs/01_llm_client.md`, `specs/02_ocr_pipeline.md`). One spec per module. Number prefixes follow the build order in section 9.
16
+ 5. Once a spec is implemented, the spec file is preserved alongside the code as documentation.
17
+
18
+ **Sequencing rule:** never skip the spec step. If you find yourself wanting to "just code it," stop and write the spec first — it forces precision and exposes hidden assumptions.
19
+
20
+ ---
21
+
22
+ ## 1. Background
23
+
24
+ ### What TenderIQ is
25
+ TenderIQ is an AI-powered platform that automates eligibility evaluation of bidders against government tender criteria. It is being built for the **Central Reserve Police Force (CRPF) hackathon, Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement**.
26
+
27
+ ### Why it exists
28
+ Government procurement officers today manually read tender documents (criteria, thresholds, compliance requirements) and bidder submissions (financial statements, certifications, project records — often in mixed formats including scans and photos), and decide whether each bidder meets each criterion. For one tender, a committee may spend days; two evaluators routinely reach different conclusions on the same documents; there is no consistent audit trail.
29
+
30
+ TenderIQ does this evaluation automatically while preserving human oversight: extract criteria from the tender, parse bidder documents, evaluate criterion-by-criterion with confidence scoring, surface ambiguous cases for human review, and emit a complete audit log.
31
+
32
+ ### Where this project sits in the hackathon
33
+ - **Round 1 (Idea Phase)**: written submission — already shortlisted. See `idea.md`.
34
+ - **Round 2 (Prototype Phase)**: working prototype — this is what we are building. Submission requirements are in `submission_requirements.md`.
35
+
36
+ ### Source documents in this repository
37
+ | File | Purpose |
38
+ |---|---|
39
+ | `theme.md` | Original problem statement from CRPF (the "why" and the hard constraints) |
40
+ | `idea.md` | The shortlisted Round 1 written submission (the "what") |
41
+ | `understanding.md` | Synthesized understanding of the problem space |
42
+ | `submission_requirements.md` | Form fields required for the Round 2 submission |
43
+ | `IMPLEMENTATION_PLAN.md` | **This file** — the build plan |
44
+ | `specs/` | Per-module spec documents (created during build, one per module) |
45
+
46
+ Read those four documents (theme, idea, understanding, submission requirements) before drafting the first spec.
47
+
48
+ ---
49
+
50
+ ## 2. Hard Constraints (from the theme — non-negotiable)
51
+
52
+ These are evaluator-facing requirements. Every architectural decision must respect them.
53
+
54
+ 1. **Every verdict must be explainable at criterion level** — for each (bidder, criterion) pair the system must show: which criterion was checked, which document and page provided the evidence, what value was extracted, what confidence the system had, and why the verdict was assigned.
55
+ 2. **Never silently disqualify** — low-confidence or ambiguous cases must be routed to a human review queue with a stated reason, never auto-rejected.
56
+ 3. **Must handle scanned documents and photographs** — OCR is mandatory. The system cannot assume digital text.
57
+ 4. **End-to-end auditable** — every action (criterion extraction, evaluation, OCR fallback invocation, human review action) must be logged with timestamp, model version, actor, and payload.
58
+
59
+ A submission that fails any of these is unlikely to score well. Treat them as acceptance criteria for the system as a whole.
60
+
61
+ ---
62
+
63
+ ## 3. Operating Constraints (this build)
64
+
65
+ - **Time budget:** ~6 hours total — ~5h build + ~1.5h deck/video/screenshots/submission. Do not exceed scope. Compression strategy is documented in section 11.
66
+ - **Platform:** Windows 11 development machine. Streamlit Cloud for hosted demo.
67
+ - **Language:** Python 3.10+.
68
+ - **Starting point:** the project is empty except for the source documents listed in section 1. Everything below is to be created.
69
+ - **API access:** the developer has a **DeepSeek API key**. No other LLM/vision API keys are assumed available.
70
+ - **Storage:** file-based only. SQLite for the audit log; ChromaDB persistent client for vectors. No external services beyond the DeepSeek API and Streamlit Cloud.
71
+ - **Auth/multi-user:** out of scope. A single hardcoded "officer" identity is used in audit entries.
72
+
73
+ ---
74
+
75
+ ## 4. Confirmed Architectural Decisions
76
+
77
+ These were the result of explicit trade-off discussions before the plan was written. Do not relitigate without strong reason.
78
+
79
+ ### 4.1 UI / Backend
80
+ **Single Streamlit app** (`streamlit==1.39.0`). No separate frontend, no FastAPI service. Streamlit handles UI and orchestration. Deployable free to Streamlit Community Cloud, which satisfies the "Demo Link" submission requirement.
81
+
82
+ ### 4.2 LLM
83
+ **DeepSeek API**, model `deepseek-v4-pro`, called via the **OpenAI Python SDK** with `base_url="https://api.deepseek.com/v1"` (DeepSeek is OpenAI-compatible). DeepSeek V4-Pro is multimodal — it accepts image inputs, which we exploit for vision-OCR (section 4.4).
84
+
85
+ ### 4.3 Live-first LLM with cached fallback
86
+ The app **always attempts a live LLM call first**. On any `LLMUnavailable` exception (rate limit, network error, malformed JSON after retries, missing key), it **silently falls back** to pre-computed JSON shipped with the repo (`data/precomputed/*.json`). When fallback fires, a banner is shown and an audit entry is written. This means: judges see real AI executing during their evaluation; the demo still works if the API is down or the key is missing.
87
+
88
+ ### 4.4 OCR — three-tier pipeline (the robustness centerpiece)
89
+ Bidder documents arrive in mixed formats (typed PDFs, scanned PDFs, photographs of certificates). The OCR pipeline handles each in increasing order of cost:
90
+
91
+ | Tier | Engine | When it runs | Cost |
92
+ |---|---|---|---|
93
+ | 1 | PyMuPDF text extraction | Document is a typed PDF (detected via `is_text_pdf` heuristic) | Free, instant |
94
+ | 2 | Tesseract (`pytesseract` + system binary) | Document is a scanned PDF or image | Free, fast, accuracy varies |
95
+ | 3 | DeepSeek Vision LLM | Tesseract `mean_conf < 0.65` or extracted text suspiciously short | API call, slow, very accurate |
96
+
97
+ Each extracted page records which tier produced it, and that provenance is shown in the UI ("Read by Tesseract @ 58% → re-read by Vision-LLM @ 95%"). This is more robust than single-engine OCR and is a real production pattern.
98
+
99
+ ### 4.5 Vector store
100
+ **ChromaDB** persistent client, embedded in-process, file-backed under `.chroma/`. Default embedding model is `all-MiniLM-L6-v2` from `sentence-transformers` (~80MB, downloaded on first run). Two collections: `tender_chunks`, `bidder_chunks` (filterable by `bidder_id`).
101
+
102
+ ### 4.6 Audit log
103
+ **SQLite** single-file DB (`audit.db`) with one append-only table `audit_log`.
104
+
105
+ ### 4.7 Things explicitly cut
106
+ - **LayoutLM** — too heavy for the build window. Robustness comes from the 3-tier OCR (vision LLM tier handles documents LayoutLM would otherwise cover).
107
+ - **easyocr** — would add ~1GB (PyTorch). Vision-LLM tier replaces it.
108
+ - **PostgreSQL** — SQLite is sufficient.
109
+ - **React / Next.js / FastAPI split** — Streamlit alone meets all UI needs.
110
+ - **Authentication / multi-user** — single hardcoded officer identity.
111
+ - **Test infrastructure beyond a smoke test** — explicit time-budget decision.
112
+ - **Map-reduce LLM extraction** — mock tender is ~5 pages, fits comfortably in V4's 1M context window in a single call.
113
+
114
+ ---
115
+
116
+ ## 5. Project Structure
117
+
118
+ ```
119
+ TenderIQ/
120
+ ├── app.py # Streamlit entry point, tabs router
121
+ ├── requirements.txt # pinned pip deps (section 12)
122
+ ├── packages.txt # apt packages for Streamlit Cloud
123
+ ├── .env.example # DEEPSEEK_API_KEY=
124
+ ├── .gitignore # .env, .chroma/, audit.db, __pycache__, .ocr_cache/
125
+ ├── README.md # run instructions (local + cloud)
126
+ ├── ARCHITECTURE.md # diagram + flow (used as Custom Attachment)
127
+ ├── IMPLEMENTATION_PLAN.md # this file
128
+
129
+ ├── specs/ # per-module specs (created during build)
130
+ │ ├── 01_config_and_schemas.md
131
+ │ ├── 02_llm_client.md
132
+ │ ├── 03_pdf_utils.md
133
+ │ ├── 04_ocr_pipeline.md
134
+ │ ├── 05_chunker.md
135
+ │ ├── 06_vectorstore.md
136
+ │ ├── 07_criteria_extractor.md
137
+ │ ├── 08_bidder_processor.md
138
+ │ ├── 09_evaluator.md
139
+ │ ├── 10_audit_and_fallback.md
140
+ │ ├── 11_mock_data.md
141
+ │ ├── 12_precompute.md
142
+ │ └── 13_ui_tabs.md
143
+
144
+ ├── core/
145
+ │ ├── __init__.py
146
+ │ ├── config.py # env loading, model name, thresholds, paths
147
+ │ ├── schemas.py # pydantic: Criterion, Evidence, Verdict, AuditEntry
148
+ │ ├── prompts.py # EXTRACT_CRITERIA_PROMPT, EVALUATE_CRITERION_PROMPT, VISION_OCR_PROMPT
149
+ │ ├── llm_client.py # DeepSeek wrapper: chat_json, chat_vision, LLMUnavailable
150
+ │ ├── pdf_utils.py # PyMuPDF: extract_pages, is_text_pdf, render_page_to_image
151
+ │ ├── ocr_pipeline.py # 3-tier OCR orchestrator
152
+ │ ├── chunker.py # tender + bidder docs → chunks with metadata
153
+ │ ├── vectorstore.py # ChromaDB persistent client + helpers
154
+ │ ├── criteria_extractor.py # Stage 1: tender PDF → List[Criterion]
155
+ │ ├── bidder_processor.py # Stage 2: bidder docs → indexed chunks + evidence retrieval
156
+ │ ├── evaluator.py # Stage 3: per-criterion verdict with combined confidence
157
+ │ ├── audit.py # SQLite audit log writer/reader
158
+ │ └── fallback.py # load pre-computed JSON when live LLM fails
159
+
160
+ ├── ui/
161
+ │ ├── __init__.py
162
+ │ ├── tab_overview.py # hero, architecture image, KPIs
163
+ │ ├── tab_tender.py # upload tender → show criteria
164
+ │ ├── tab_bidders.py # bidder evaluation table with verdicts + sources
165
+ │ ├── tab_review.py # human review queue (Approve / Edit / Reject)
166
+ │ ├── tab_audit.py # audit log table + CSV export
167
+ │ └── components.py # verdict pill, confidence bar, citation chip, OCR-tier badge
168
+
169
+ ├── data/
170
+ │ ├── tender/
171
+ │ │ └── crpf_construction_tender.pdf
172
+ │ ├── bidders/
173
+ │ │ ├── bidder_a/ # all eligible — typed PDFs
174
+ │ │ ├── bidder_b/ # ineligible — turnover too low
175
+ │ │ └── bidder_c/ # needs review — scanned turnover cert
176
+ │ │ └── turnover_certificate_scan.png
177
+ │ └── precomputed/ # fallback if live API fails
178
+ │ ├── criteria.json
179
+ │ ├── eval_bidder_a.json
180
+ │ ├── eval_bidder_b.json
181
+ │ └── eval_bidder_c.json
182
+
183
+ ├── scripts/
184
+ │ ├── generate_mock_data.py # reportlab → PDFs + PIL/numpy → noisy scan
185
+ │ ├── precompute_results.py # run pipeline once, save fallback JSON
186
+ │ └── smoke_test.py # programmatic end-to-end check
187
+
188
+ ├── assets/
189
+ │ ├── logo.png
190
+ │ ├── architecture.png # for deck + Custom Attachment
191
+ │ └── screenshots/ # 3-5 PNGs for submission
192
+
193
+ └── deck/
194
+ └── TenderIQ_Pitch.pdf # 8-slide pitch deck
195
+ ```
196
+
197
+ Runtime artifacts (gitignored): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`, `__pycache__/`.
198
+
199
+ ---
200
+
201
+ ## 6. Module Responsibilities
202
+
203
+ This is the contract surface for each module. Each one will get its own spec document; the descriptions here are the seed material for those specs.
204
+
205
+ ### `core/config.py`
206
+ - Load `DEEPSEEK_API_KEY` from `st.secrets` first, then `.env` via `python-dotenv`.
207
+ - Constants:
208
+ - `DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"`
209
+ - `MODEL_NAME = "deepseek-v4-pro"`
210
+ - `MODEL_VERSION = "deepseek-v4-pro@<build-date>"` — used for audit stamping
211
+ - `CONFIDENCE_HIGH = 0.80`
212
+ - `CONFIDENCE_REVIEW = 0.55`
213
+ - `OCR_TESSERACT_MIN_CONF = 0.65`
214
+ - Paths: `DATA_DIR`, `CHROMA_DIR = ".chroma"`, `AUDIT_DB = "audit.db"`, `PRECOMPUTED_DIR`, `OCR_CACHE_DIR = ".ocr_cache"`.
215
+
216
+ ### `core/schemas.py`
217
+ Pydantic models matching the JSON shapes in section 7. At minimum: `Criterion`, `Rule`, `Evidence`, `Source`, `Verdict`, `AuditEntry`.
218
+
219
+ ### `core/prompts.py`
220
+ Three string constants — see section 8.
221
+
222
+ ### `core/llm_client.py`
223
+ ```
224
+ class LLMUnavailable(Exception): ...
225
+
226
+ class LLM:
227
+ def __init__(self, api_key: str | None = None): ...
228
+ def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict: ...
229
+ def chat_vision(self, system: str, user_text: str, image: bytes | str | Path,
230
+ max_retries: int = 2) -> str: ...
231
+ ```
232
+ - `chat_json` uses `response_format={"type": "json_object"}`, `temperature=0`, retries on JSON parse errors and 5xx with exponential backoff. Raises `LLMUnavailable` after `max_retries`.
233
+ - `chat_vision` encodes the image as `data:image/png;base64,...` and sends a multimodal message in OpenAI-compatible format (`{"type": "image_url", "image_url": {"url": "..."}}`). Returns transcribed text. Raises `LLMUnavailable` on failure.
234
+ - Every caller in `core/criteria_extractor.py`, `core/evaluator.py`, `core/ocr_pipeline.py` wraps calls in `try/except LLMUnavailable` and routes to `core/fallback.py` (or to a graceful low-confidence result for the OCR case).
235
+
236
+ ### `core/pdf_utils.py`
237
+ - `extract_pages(path: Path) -> list[dict]` — returns `[{"page": int, "text": str}]` via `fitz.open`.
238
+ - `is_text_pdf(path: Path) -> bool` — heuristic on average chars per page.
239
+ - `render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image` — for OCR.
240
+
241
+ ### `core/ocr_pipeline.py`
242
+ The robustness centerpiece. Orchestrates the three tiers described in section 4.4.
243
+
244
+ ```
245
+ def extract_document(file_path: Path) -> list[ExtractedPage]: ...
246
+ ```
247
+
248
+ `ExtractedPage` shape: `{"page": int, "text": str, "source_type": "text_pdf" | "tesseract" | "vision_llm", "confidence": float, "raw_tier_results": {"tesseract_conf": float | None, "vision_used": bool}}`.
249
+
250
+ Logic:
251
+ 1. If file is image (PNG/JPG): treat as 1-page; go straight to tier 2.
252
+ 2. If file is PDF and `is_text_pdf == True`: tier 1 (text_pdf, conf=1.0).
253
+ 3. Else: for each page render to image, run tier 2 (Tesseract via `pytesseract.image_to_data`), compute mean confidence excluding `-1`s, divided by 100.
254
+ 4. If `mean_conf < OCR_TESSERACT_MIN_CONF` or text length absurdly short relative to image size: invoke tier 3 (`llm_client.chat_vision(VISION_OCR_PROMPT, image)`), set `source_type="vision_llm"`, `confidence=0.95`. Log `vision_ocr_invoked` audit entry.
255
+ 5. If tier 3 raises `LLMUnavailable`: keep tier-2 result with `confidence < 0.65` (will trigger `needs_review` downstream).
256
+ 6. Cache per-file results in `.ocr_cache/<file_hash>.json` so reruns don't re-OCR.
257
+
258
+ ### `core/chunker.py`
259
+ - `chunk_tender(pages: list[dict], tender_id: str) -> list[dict]` — ~500-token chunks per page, regex-detect clause headings (`^\d+(\.\d+)*\s+`).
260
+ - `chunk_bidder(pages: list[ExtractedPage], bidder_id: str, doc_name: str) -> list[dict]` — page-level chunks (one per page; or per-doc if very short). Each chunk's metadata includes `bidder_id`, `doc_name`, `page`, `source_type`, `ocr_confidence`.
261
+
262
+ ### `core/vectorstore.py`
263
+ - `get_client()` cached with `@st.cache_resource`, returns `chromadb.PersistentClient(path=CHROMA_DIR)`.
264
+ - `get_collection(name: str)` — creates if missing.
265
+ - `add_chunks(collection, chunks: list[dict], metadatas: list[dict])` — ID = `hash(text)[:16]` to dedupe across reruns.
266
+ - `query(collection, text: str, k: int = 4, where: dict | None = None) -> list[dict]` — returns `[{text, metadata, distance}, ...]`.
267
+
268
+ ### `core/criteria_extractor.py`
269
+ ```
270
+ def extract_criteria(tender_pdf_path: Path) -> list[Criterion]: ...
271
+ ```
272
+ 1. `pdf_utils.extract_pages(tender_pdf_path)` → join all page text with `\n--- PAGE N ---\n` markers.
273
+ 2. `llm.chat_json(EXTRACT_CRITERIA_PROMPT_SYSTEM, prompt + tender_text)`.
274
+ 3. Parse JSON `{"criteria": [...]}`, validate via Pydantic, attach UUIDs if absent.
275
+ 4. Index criteria text into the `tender_chunks` collection (for future retrieval / explainability features).
276
+ 5. Return list. On `LLMUnavailable` → `fallback.load_criteria()` + audit `precomputed_fallback_used`.
277
+
278
+ ### `core/bidder_processor.py`
279
+ ```
280
+ def process_bidder(bidder_id: str, files: list[Path]) -> None:
281
+ """Extract, chunk, and index every file for this bidder."""
282
+
283
+ def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
284
+ """Retrieve top-k bidder chunks relevant to this criterion."""
285
+ ```
286
+ - Process step: each file → `ocr_pipeline.extract_document` → `chunker.chunk_bidder` → `vectorstore.add_chunks(bidder_chunks, ..., where={"bidder_id": bidder_id})`. Audit: `bidder_processed`.
287
+ - Gather step: query string = `criterion.title + " " + " ".join(criterion.query_hints)`; `vectorstore.query(bidder_chunks, q, k=4, where={"bidder_id": bidder_id})`. Map results to `Evidence` objects.
288
+
289
+ ### `core/evaluator.py`
290
+ ```
291
+ def evaluate(bidder_id: str, criterion: Criterion) -> Verdict: ...
292
+ def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]: ...
293
+ ```
294
+
295
+ Algorithm for `evaluate`:
296
+ 1. `evidence = bidder_processor.gather_evidence(bidder_id, criterion)`.
297
+ 2. If `evidence` empty: return `Verdict(verdict="needs_review", reason="No matching evidence found in submitted documents.", llm_confidence=0, combined_confidence=0)` and audit. Done.
298
+ 3. Call `llm.chat_json(EVALUATE_CRITERION_PROMPT_SYSTEM, render_user(criterion, evidence))`.
299
+ 4. Parse: `{verdict, extracted_value, normalized_value, chosen_source, llm_confidence, reason}`.
300
+ 5. Compute `combined_confidence` based on `chosen_source.source_type`:
301
+ - `"text_pdf"`: `combined = llm_confidence`
302
+ - `"vision_llm"`: `combined = 0.7 * llm_confidence + 0.3 * 0.95`
303
+ - `"tesseract"`: `combined = 0.6 * llm_confidence + 0.4 * tesseract_conf`
304
+ 6. Apply threshold rules (in order):
305
+ - LLM verdict is `needs_review` → keep.
306
+ - `combined >= 0.80` → keep LLM verdict.
307
+ - `0.55 <= combined < 0.80` AND verdict is `not_eligible` → **downgrade to `needs_review`** (never silently disqualify).
308
+ - `combined < 0.55` → force `needs_review`.
309
+ 7. Build `Verdict` object, audit `criterion_evaluated`, return.
310
+ 8. On `LLMUnavailable` → `fallback.load_evaluation(bidder_id, criterion.id)` + audit fallback.
311
+
312
+ ### `core/audit.py`
313
+ - SQLite single table:
314
+ ```sql
315
+ CREATE TABLE audit_log (
316
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
317
+ ts TEXT NOT NULL,
318
+ action TEXT NOT NULL,
319
+ actor TEXT NOT NULL,
320
+ model_version TEXT,
321
+ bidder_id TEXT,
322
+ criterion_id TEXT,
323
+ payload_json TEXT
324
+ );
325
+ ```
326
+ - `log(action: str, actor: str = "system", **fields) -> int` — inserts.
327
+ - `query(filters: dict | None = None) -> list[dict]` — filterable by `bidder_id`, `action`, date range.
328
+ - Action vocabulary: `criteria_extracted`, `bidder_processed`, `criterion_evaluated`, `human_review_action`, `precomputed_fallback_used`, `vision_ocr_invoked`.
329
+ - Connection cached with `@st.cache_resource`.
330
+
331
+ ### `core/fallback.py`
332
+ - `load_criteria() -> list[Criterion]` — reads `data/precomputed/criteria.json`.
333
+ - `load_evaluation(bidder_id: str, criterion_id: str) -> Verdict` — reads `data/precomputed/eval_bidder_<id>.json` and indexes into the `criterion_id` block.
334
+ - Each fallback hit logs `precomputed_fallback_used` and sets `st.session_state["fallback_active"] = True` so the UI can render the banner.
335
+
336
+ ---
337
+
338
+ ## 7. Data Schemas
339
+
340
+ All canonical, all serialized as JSON for storage and inter-module communication.
341
+
342
+ ### `Criterion`
343
+ ```json
344
+ {
345
+ "id": "C1",
346
+ "title": "Minimum Annual Turnover",
347
+ "category": "financial",
348
+ "mandatory": true,
349
+ "description": "Average annual turnover during the last three financial years shall not be less than INR 5 Crore.",
350
+ "rule": {
351
+ "type": "numeric_threshold",
352
+ "field": "annual_turnover_inr",
353
+ "operator": ">=",
354
+ "value": 50000000,
355
+ "unit": "INR"
356
+ },
357
+ "query_hints": ["annual turnover", "total revenue", "ITR", "audited financials"],
358
+ "source_page": 3,
359
+ "source_clause": "3.2(a)"
360
+ }
361
+ ```
362
+ Fields:
363
+ - `category`: `"financial" | "technical" | "compliance"`.
364
+ - `rule.type`: `"numeric_threshold" | "count_threshold" | "certification_present" | "document_present"`.
365
+ - `rule.operator`: `">=" | "<=" | "==" | "exists"`.
366
+ - `query_hints`: 3–5 short noun phrases used to build retrieval queries.
367
+
368
+ ### `Evidence` (one retrieved chunk during evaluation)
369
+ ```json
370
+ {
371
+ "bidder_id": "bidder_a",
372
+ "doc_name": "audited_financials.pdf",
373
+ "page": 4,
374
+ "text": "...annual turnover for FY 2024-25 was INR 6,20,00,000...",
375
+ "source_type": "text_pdf",
376
+ "ocr_confidence": null
377
+ }
378
+ ```
379
+ - `source_type`: `"text_pdf" | "tesseract" | "vision_llm"`.
380
+ - `ocr_confidence`: 0.0–1.0 if OCR was used; `null` for `text_pdf`.
381
+
382
+ ### `Verdict`
383
+ ```json
384
+ {
385
+ "verdict_id": "V-uuid",
386
+ "bidder_id": "bidder_a",
387
+ "criterion_id": "C1",
388
+ "verdict": "eligible",
389
+ "extracted_value": "INR 6.2 Cr",
390
+ "normalized_value": 62000000,
391
+ "source": {
392
+ "doc_name": "audited_financials.pdf",
393
+ "page": 4,
394
+ "snippet": "...annual turnover... INR 6,20,00,000...",
395
+ "source_type": "text_pdf"
396
+ },
397
+ "llm_confidence": 0.93,
398
+ "ocr_confidence": null,
399
+ "combined_confidence": 0.93,
400
+ "reason": "Extracted turnover of INR 6.2 Cr exceeds the required threshold of INR 5 Cr.",
401
+ "model_version": "deepseek-v4-pro@2026-05-07",
402
+ "timestamp": "2026-05-07T12:34:56Z",
403
+ "review_status": "pending"
404
+ }
405
+ ```
406
+ - `verdict`: `"eligible" | "not_eligible" | "needs_review"`.
407
+ - `review_status`: `"pending" | "approved" | "edited" | "rejected"`.
408
+
409
+ ### `AuditEntry`
410
+ Maps directly to the SQLite row (see `core/audit.py` description). The `payload_json` field carries the action-specific details (e.g., for `criterion_evaluated`: `{"verdict": "eligible", "combined_confidence": 0.93}`).
411
+
412
+ ---
413
+
414
+ ## 8. LLM Prompts
415
+
416
+ All three prompts must demand strict JSON output where applicable, run at `temperature=0`, and rely on `response_format={"type": "json_object"}` for the JSON ones.
417
+
418
+ ### `EXTRACT_CRITERIA_PROMPT`
419
+ **System:**
420
+ > You are an expert in Indian government tender analysis (CRPF context). Your job is to extract eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria not present in the text. Classify each criterion as mandatory or optional based on cue words: "shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", "may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints that an evaluator would search for in bidder documents.
421
+
422
+ **User template:** the full tender text + a JSON schema example + the instruction:
423
+ > Return `{"criteria": [Criterion, ...]}`. Each Criterion must include id (C1, C2, ...), title, category (financial / technical / compliance), mandatory (bool), description (verbatim or close paraphrase), rule (typed per the schema), query_hints, source_page (int), source_clause (string).
424
+
425
+ ### `EVALUATE_CRITERION_PROMPT`
426
+ **System:**
427
+ > You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest single source. NEVER guess values not present in the evidence. If evidence is missing or ambiguous, return needs_review with reason. Output STRICT JSON.
428
+
429
+ **User template** (variables substituted):
430
+ ```
431
+ CRITERION:
432
+ { ...criterion JSON... }
433
+
434
+ RETRIEVED EVIDENCE (top-k chunks from this bidder, with source + OCR confidence):
435
+ [
436
+ { "doc_name": "...", "page": 4, "ocr_confidence": null, "source_type": "text_pdf",
437
+ "text": "..." },
438
+ ...
439
+ ]
440
+
441
+ Return JSON:
442
+ {
443
+ "verdict": "eligible" | "not_eligible" | "needs_review",
444
+ "extracted_value": "<short string as found>",
445
+ "normalized_value": <number or null>,
446
+ "chosen_source": {"doc_name": "...", "page": <int>, "snippet": "<<= 200 chars>", "source_type": "..."},
447
+ "llm_confidence": <0..1>,
448
+ "reason": "<one or two sentences>"
449
+ }
450
+
451
+ Rules:
452
+ - If evidence directly contains a value satisfying the rule, verdict=eligible with high llm_confidence.
453
+ - If evidence directly contradicts the rule, verdict=not_eligible.
454
+ - If no relevant evidence retrieved, verdict=needs_review, llm_confidence<=0.4.
455
+ - If the source is OCR with low confidence and the value is borderline, lean to needs_review.
456
+ ```
457
+
458
+ ### `VISION_OCR_PROMPT`
459
+ **System:**
460
+ > You are an OCR engine for Indian government procurement documents. Transcribe the image text faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — no commentary.
461
+
462
+ **User text:** "Transcribe this document page completely. Pay special attention to numeric values like turnover figures (INR / Crore / Lakh), dates, and registration numbers." (Image attached.)
463
+
464
+ ---
465
+
466
+ ## 9. Build Order
467
+
468
+ The order is chosen so that the system is **demoable after every major step**. Each numbered item is also the spec sequence — write the spec, get it reviewed, then implement.
469
+
470
+ ### Step 1 — Skeleton (≈ 15 min)
471
+ Folder structure, `requirements.txt`, `packages.txt`, `.env.example`, `.gitignore`, stub `app.py` with 5 empty Streamlit tabs and sidebar.
472
+ **Spec:** `specs/00_skeleton.md` (light — mostly file list and stub contents).
473
+ **Checkpoint:** `streamlit run app.py` shows the empty shell.
474
+
475
+ ### Step 2 — Mock data generation (≈ 25 min)
476
+ `scripts/generate_mock_data.py` produces tender PDF, three bidders' PDFs, and the noisy scan PNG (per section 10).
477
+ **Spec:** `specs/11_mock_data.md`.
478
+ **Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence.
479
+
480
+ ### Step 3 — Config + schemas + prompts (≈ 25 min)
481
+ `core/config.py`, `core/schemas.py`, `core/prompts.py`.
482
+ **Spec:** `specs/01_config_and_schemas.md`.
483
+
484
+ ### Step 4 — LLM client (≈ 25 min)
485
+ `core/llm_client.py` with both `chat_json` and `chat_vision`. Smoke-test with a one-line script that calls each.
486
+ **Spec:** `specs/02_llm_client.md`.
487
+ **Checkpoint:** ad-hoc REPL call to `chat_json("hi", "respond with {\"ok\": true}")` returns `{"ok": True}`.
488
+
489
+ ### Step 5 — PDF utils + chunker (≈ 15 min)
490
+ `core/pdf_utils.py`, `core/chunker.py`.
491
+ **Spec:** `specs/03_pdf_utils.md`, `specs/05_chunker.md` (can be combined).
492
+
493
+ ### Step 6 — Criteria extractor + Tab 2 wiring (≈ 30 min)
494
+ `core/criteria_extractor.py` + minimal `ui/tab_tender.py`.
495
+ **Spec:** `specs/07_criteria_extractor.md`.
496
+ **Checkpoint:** Tab 2 in the running app shows 5 criteria extracted from the mock tender.
497
+
498
+ ### Step 7 — OCR pipeline (≈ 30 min)
499
+ `core/ocr_pipeline.py`. Verify on `turnover_certificate_scan.png`.
500
+ **Spec:** `specs/04_ocr_pipeline.md`.
501
+ **Checkpoint:** running `extract_document(turnover_certificate_scan.png)` first attempts Tesseract (low conf), then falls through to vision-LLM, returns `source_type="vision_llm"` with the correct turnover figure.
502
+
503
+ ### Step 8 — Vector store + bidder processor (≈ 25 min)
504
+ `core/vectorstore.py`, `core/bidder_processor.py`.
505
+ **Spec:** `specs/06_vectorstore.md`, `specs/08_bidder_processor.md`.
506
+ **Checkpoint:** `process_bidder("bidder_a", ...)` indexes all five docs; `gather_evidence("bidder_a", turnover_criterion)` returns top-4 chunks, the strongest mentioning "INR 6,20,00,000".
507
+
508
+ ### Step 9 — Evaluator + threshold logic (≈ 25 min)
509
+ `core/evaluator.py`.
510
+ **Spec:** `specs/09_evaluator.md`.
511
+ **Checkpoint:** `evaluate("bidder_a", turnover_criterion)` returns verdict=eligible, combined_confidence ≥ 0.8; `evaluate("bidder_b", turnover_criterion)` returns verdict=not_eligible.
512
+
513
+ ### Step 10 — Audit + fallback (≈ 20 min)
514
+ `core/audit.py`, `core/fallback.py`.
515
+ **Spec:** `specs/10_audit_and_fallback.md`.
516
+
517
+ ### Step 11 — Pre-compute results (≈ 15 min)
518
+ `scripts/precompute_results.py` runs the full pipeline, dumps `criteria.json` + `eval_bidder_*.json`. Commit results.
519
+ **Spec:** `specs/12_precompute.md`.
520
+ **Checkpoint:** four JSON files exist and validate against the schemas.
521
+
522
+ ### Step 12 — UI tabs (≈ 80 min total)
523
+ - Tab 3 — Bidder evaluation (35 min): rows with verdict pills, source chips, OCR-tier badges, confidence bars, expandable Reason and Source Snippet.
524
+ - Tab 4 — Review queue (15 min): filtered list of `needs_review` rows with Approve/Edit/Reject.
525
+ - Tab 5 — Audit log (15 min): sortable table + CSV export.
526
+ - Tab 1 — Overview (15 min): hero, architecture image, KPIs, "Use Pre-loaded Demo" CTA.
527
+
528
+ `ui/components.py` is built incrementally as Tabs 3 and 4 need it.
529
+ **Spec:** `specs/13_ui_tabs.md` (covers all five tabs and `components.py`).
530
+
531
+ ### Step 13 — Smoke test + README (≈ 15 min)
532
+ `scripts/smoke_test.py` (programmatic full flow), `README.md`.
533
+
534
+ ### Step 14 — Streamlit Cloud deploy (≈ 25 min)
535
+ Push to GitHub, connect Streamlit Cloud, set `DEEPSEEK_API_KEY` in app secrets, verify deployed URL works in incognito with API and again with the key removed (precomputed mode).
536
+
537
+ ### Step 15 — Submission package (≈ 90 min)
538
+ Architecture diagram, 8-slide deck, 4 screenshots, 2-min demo video (OBS / Win+G), zip source, fill submission form.
539
+
540
+ ---
541
+
542
+ ## 10. Mock Data Strategy
543
+
544
+ Single deterministic script `scripts/generate_mock_data.py`, runs in <30 seconds.
545
+
546
+ ### Tender PDF — `data/tender/crpf_construction_tender.pdf`
547
+ `reportlab` SimpleDocTemplate, 5–6 pages with these sections: (1) Introduction, (2) Scope of Work, (3) Eligibility Criteria, (4) Submission Procedure, (5) Evaluation Methodology, (6) Annexures. Section 3 contains five criteria phrased in formal tender language (this is the theme's sample scenario verbatim, so judges will recognize it):
548
+
549
+ | ID | Clause | Text | Mandatory? | Category |
550
+ |---|---|---|---|---|
551
+ | C1 | 3.2(a) | "...minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years..." | Yes | financial |
552
+ | C2 | 3.2(b) | "...successfully completed at least three (3) similar construction projects in the last five (5) financial years..." | Yes | technical |
553
+ | C3 | 3.2(c) | "...shall possess a valid Goods and Services Tax (GST) registration..." | Yes | compliance |
554
+ | C4 | 3.2(d) | "...shall hold a valid ISO 9001:2015 Quality Management System certification..." | Yes | compliance |
555
+ | C5 | 3.2(e) | "...preferably, the bidder may have prior experience with paramilitary infrastructure..." | **No** | technical |
556
+
557
+ C5 tests the mandatory-vs-optional classification.
558
+
559
+ ### Bidder A (clearly eligible) — typed PDFs only
560
+ `company_profile.pdf`, `audited_financials.pdf` (FY 22-23: ₹5.8 Cr, 23-24: ₹6.2 Cr, 24-25: ₹7.1 Cr), `project_experience.pdf` (5 projects in 5 years), `gst_certificate.pdf` (GSTIN, valid 2027), `iso_9001.pdf` (valid 2027).
561
+
562
+ ### Bidder B (clearly ineligible — turnover too low) — typed PDFs only
563
+ Same docs as A but `audited_financials.pdf` shows ₹1.2 / ₹1.5 / ₹1.8 Cr (all below threshold). Other criteria pass.
564
+
565
+ ### Bidder C (needs review — scanned turnover certificate) — typed + one scan
566
+ Typed `company_profile.pdf`, `project_experience.pdf` (3 projects — borderline meets count), `gst_certificate.pdf`, `iso_9001.pdf`.
567
+
568
+ **`turnover_certificate_scan.png`** generation:
569
+ 1. Render a `reportlab` page with the CA's turnover statement.
570
+ 2. Convert to `PIL.Image` via `pillow`.
571
+ 3. Apply: `ImageFilter.GaussianBlur(radius=1.5)`, salt-and-pepper noise via `numpy`, `image.rotate(-2, fillcolor="white")`, JPEG-compress at quality=40, save as PNG.
572
+ 4. Outcome: Tesseract reads it with mean confidence ~50–65% → triggers Tier-3 vision LLM. Vision LLM transcribes correctly; combined-confidence rule still routes Bidder C to `needs_review` (this is intended — it demonstrates the safety rule).
573
+
574
+ ### Pre-computed fallback files — `data/precomputed/`
575
+ After the pipeline modules are working, run `scripts/precompute_results.py` once to produce:
576
+ - `criteria.json` — output of `extract_criteria(tender_pdf)`.
577
+ - `eval_bidder_a.json`, `eval_bidder_b.json`, `eval_bidder_c.json` — per-bidder verdicts for all criteria.
578
+
579
+ Commit these four files to the repo. They are the safety net for live demos.
580
+
581
+ ---
582
+
583
+ ## 11. Streamlit UI
584
+
585
+ 5 tabs, left-to-right narrative order:
586
+
587
+ ### Tab 1 — Overview
588
+ Hero text ("TenderIQ — explainable AI for tender evaluation"), architecture image (`assets/architecture.png`), 4 KPI cards (criteria extracted, bidders evaluated, hours saved, audit entries). "Use Pre-loaded Demo Data" (default) and "Upload Your Own" CTA.
589
+
590
+ ### Tab 2 — Tender Analysis
591
+ File uploader (defaults to mock tender preview). Button **"Extract Criteria (Live LLM)"** runs `criteria_extractor`. Results render as cards with category badge (color-coded), mandatory pill, description, source-page chip. Cached to `st.session_state["criteria"]`.
592
+
593
+ ### Tab 3 — Bidder Evaluation
594
+ Bidder multi-select (defaults all 3). Button **"Run Evaluation"** processes each bidder × each criterion. Output: rows with verdict pill (green/red/amber), extracted value, source chip (doc + page + **OCR-tier badge** showing `text_pdf` / `tesseract` / `vision_llm`), confidence bar, expandable Reason and Source Snippet. Per-bidder summary header: "X / 4 mandatory criteria met — Overall: Eligible / Not Eligible / Needs Review".
595
+
596
+ ### Tab 4 — Human Review Queue
597
+ Filtered to verdicts where `review_status == "pending"` AND `verdict == "needs_review"`. Each row: criterion, bidder, extracted value (editable), confidence, reason, source snippet, image preview if OCR'd. Buttons: Approve / Edit & Approve / Reject — each writes audit entry and updates `review_status`.
598
+
599
+ ### Tab 5 — Audit Log
600
+ Sortable table from `audit.query()`. Filter by bidder, action type. CSV export.
601
+
602
+ ### Sidebar (always visible)
603
+ Logo, project name, **DeepSeek connection status dot**:
604
+ - Green: live connection, no fallback fired this session.
605
+ - Amber: fallback fired at least once this session.
606
+ - Red: probe at startup failed.
607
+ "Reset Session" button. If `st.session_state["fallback_active"]`, show banner: "⚠ Live API unavailable — showing pre-computed results."
608
+
609
+ ---
610
+
611
+ ## 12. requirements.txt and packages.txt
612
+
613
+ `requirements.txt` (pinned):
614
+ ```
615
+ streamlit==1.39.0
616
+ openai==1.51.0
617
+ pymupdf==1.24.10
618
+ pytesseract==0.3.13
619
+ Pillow==10.4.0
620
+ numpy==1.26.4
621
+ chromadb==0.5.5
622
+ sentence-transformers==3.1.1
623
+ pydantic==2.9.2
624
+ python-dotenv==1.0.1
625
+ reportlab==4.2.5
626
+ pandas==2.2.3
627
+ ```
628
+
629
+ `packages.txt` (apt packages for Streamlit Cloud):
630
+ ```
631
+ tesseract-ocr
632
+ poppler-utils
633
+ ```
634
+
635
+ ---
636
+
637
+ ## 13. Risks and Mitigations
638
+
639
+ | Risk | Mitigation |
640
+ |---|---|
641
+ | **DeepSeek API down or rate-limited mid-demo.** | Live-first with silent fallback to `data/precomputed/*.json`. Sidebar dot turns amber. App keeps working. |
642
+ | **Tesseract install on Streamlit Cloud.** | `packages.txt` with `tesseract-ocr`. If it still fails: Tier-3 vision LLM works on raw image input, and `data/precomputed/eval_bidder_c.json` is the final safety net. |
643
+ | **DeepSeek vision call (Tier 3) fails.** | Tesseract result accepted with `confidence < 0.65` → flows to `needs_review`. Demo still works. |
644
+ | **ChromaDB first-run sentence-transformers download (~80 MB).** | `@st.cache_resource` on the client. README warns "first cloud load may take ~30s". Pre-warm by visiting deployed URL once before submission. |
645
+ | **LLM returns malformed JSON.** | `response_format={"type":"json_object"}` + 2 retries with stricter system prompt → fall back to precomputed for that item. |
646
+ | **PyMuPDF licensing.** | AGPL but allowed for hackathon use; pin `pymupdf==1.24.10`; mention in README. |
647
+ | **API key leak in repo.** | `.env` gitignored; `.env.example` ships with placeholder; Streamlit Cloud secrets used in deploy; pre-commit visual diff check. |
648
+ | **Time overrun.** | Compression order: skip Tab 1 KPIs → skip optional 5th criterion → skip CSV export → keep core flow (Tabs 2–4) intact for the video. |
649
+
650
+ ---
651
+
652
+ ## 14. Verification (run before recording the demo video)
653
+
654
+ Treat this as the acceptance test. The demo video should walk through these steps in order.
655
+
656
+ 1. **Cold start.** Delete `.chroma/`, `audit.db`. Run `streamlit run app.py`. App opens in <10s; Tab 1 renders.
657
+ 2. **Live extraction.** Tab 2 → "Extract Criteria" → 5 criteria appear within 10–20s. Sidebar dot green.
658
+ 3. **Live evaluation, Bidder A.** Tab 3 → select Bidder A → "Run Evaluation". All 4 mandatory criteria → `eligible` with combined confidence ≥ 0.80.
659
+ 4. **Live evaluation, Bidder B.** Turnover criterion → `not_eligible` with reason citing low turnover figure and source page.
660
+ 5. **Live evaluation, Bidder C — the OCR demo path.** Turnover criterion → triggers Tier 2 (Tesseract low conf) → triggers Tier 3 (DeepSeek Vision). UI shows "Read by Tesseract @ ~58% → Vision-LLM @ 95%". Final verdict: `needs_review`. Audit log gains a `vision_ocr_invoked` entry.
661
+ 6. **Review action.** Tab 4 → click "Approve" on Bidder C's turnover row → audit log gains `human_review_action` entry within 1 second; `review_status` updates.
662
+ 7. **Audit export.** Tab 5 → "Export CSV" → CSV downloads with all entries.
663
+ 8. **No-API run.** Rename `.env` (or unset secret), restart app → all "Run Live" buttons silently fall back to precomputed, banner shown, sidebar dot amber, audit gets `precomputed_fallback_used` entries.
664
+ 9. **Smoke test.** `python scripts/smoke_test.py` exits 0.
665
+ 10. **Deployed URL.** Open Streamlit Cloud URL in incognito; repeat steps 1–6.
666
+
667
+ ---
668
+
669
+ ## 15. Submission Deliverables (Round 2 form fields)
670
+
671
+ Mapping of submission requirements to artifacts:
672
+
673
+ | Form field | Artifact |
674
+ |---|---|
675
+ | Title | "TenderIQ — Explainable AI for Tender Evaluation" |
676
+ | Description | Adapted from `idea.md` |
677
+ | Parent Submission | The shortlisted Round 1 idea |
678
+ | Theme | Theme 3 |
679
+ | Snapshots | `assets/screenshots/*.png` |
680
+ | Video URL | YouTube unlisted link to 2-min demo |
681
+ | Presentation | `deck/TenderIQ_Pitch.pdf` |
682
+ | Demo Link | Streamlit Cloud URL |
683
+ | Repository URL | GitHub URL |
684
+ | Source Code | Zip of repo (excluding `.env`, `.chroma/`, `audit.db`) |
685
+ | Instructions to Run | `README.md` quickstart |
686
+ | Custom Attachment | `ARCHITECTURE.md` exported as PDF (with the architecture diagram embedded) |
687
+
688
+ ---
689
+
690
+ ## 16. Definition of Done
691
+
692
+ The build is done when **all** of the following are true:
693
+
694
+ - [ ] All 10 verification steps in section 14 pass.
695
+ - [ ] Streamlit Cloud URL is live and reachable.
696
+ - [ ] GitHub repo is public, with `.env` not committed.
697
+ - [ ] `README.md` quickstart works on a fresh clone with no API key (precomputed mode).
698
+ - [ ] Pitch deck, demo video, screenshots, and architecture PDF are produced.
699
+ - [ ] Submission form is filled and submitted.
700
+ - [ ] Memory note saved with deployment URL and submission timestamp.
idea.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TenderIQ: Explainable AI Platform for Automated Tender Evaluation & Eligibility Analysis
2
+
3
+ **Phase:** Idea Phase (Shortlisted)
4
+ **Last updated:** Apr 30, 2026
5
+ **Theme:** Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
6
+
7
+ ---
8
+
9
+ ## Problem Understanding
10
+
11
+ Government tender evaluation today is a manual, time-consuming, and error-prone process. Procurement officers must review large volumes of unstructured documents — including PDFs, scanned files, and images — to verify whether bidders meet eligibility criteria such as financial thresholds, technical experience, and compliance certifications.
12
+
13
+ This results in:
14
+ - Inconsistent evaluations across reviewers
15
+ - High turnaround time (often days per tender)
16
+ - Lack of transparency and auditability
17
+ - Risk of oversight in critical compliance checks
18
+
19
+ Our solution addresses these challenges by transforming unstructured tender and bidder data into structured, explainable, and auditable decisions.
20
+
21
+ ---
22
+
23
+ ## Proposed Solution: TenderIQ
24
+
25
+ TenderIQ is an AI-powered platform designed to automate tender evaluation while ensuring human trust, explainability, and audit readiness. The system follows a four-stage pipeline:
26
+
27
+ ### Stage 1: Tender Understanding (Criteria Extraction)
28
+
29
+ The platform extracts eligibility criteria from tender documents using a hybrid approach combining LLMs and rule-based parsing. It identifies:
30
+ - Financial conditions (e.g., turnover ≥ ₹5 Cr)
31
+ - Technical requirements (e.g., project experience)
32
+ - Compliance rules (e.g., GST registration, ISO certifications)
33
+
34
+ Each criterion is:
35
+ - Classified as mandatory or optional
36
+ - Converted into a structured, machine-readable format
37
+
38
+ ### Stage 2: Bidder Document Processing
39
+
40
+ The system processes heterogeneous bidder submissions, including:
41
+ - Typed PDFs
42
+ - Scanned documents
43
+ - Images
44
+ - Word files
45
+
46
+ The processing pipeline includes:
47
+ - OCR for scanned documents and images
48
+ - Layout-aware parsing for tables, forms, and certificates
49
+ - Entity extraction for key values such as turnover, certifications, and project count
50
+
51
+ All extracted information is stored along with:
52
+ - Source reference (document and page number)
53
+ - Confidence score
54
+
55
+ ### Stage 3: Evaluation and Decision Engine
56
+
57
+ Each bidder is evaluated on a criterion-by-criterion basis using:
58
+ - Rule-based validation (e.g., threshold checks)
59
+ - Confidence-aware scoring
60
+
61
+ The system produces three possible outcomes:
62
+ - **Eligible**
63
+ - **Not Eligible**
64
+ - **Needs Manual Review**
65
+
66
+ Ambiguous or low-confidence cases are never automatically rejected. Instead, they are flagged for human review to ensure fairness and compliance.
67
+
68
+ ### Stage 4: Explainability and Audit Layer (Key Differentiator)
69
+
70
+ Every decision is fully explainable and traceable. Each evaluation includes:
71
+ - The criterion being checked
72
+ - The extracted value
73
+ - Source document reference
74
+ - Confidence score
75
+ - Reason for the decision
76
+
77
+ **Example:**
78
+ ```
79
+ Criterion: Minimum Turnover ≥ ₹5 Cr
80
+ Extracted Value: ₹6.2 Cr
81
+ Source: Financial Statement (Page 4)
82
+ Confidence: 92%
83
+ Verdict: Eligible
84
+ ```
85
+
86
+ All system actions are logged with:
87
+ - Model version
88
+ - Timestamp
89
+ - Reviewer actions
90
+
91
+ This ensures complete end-to-end auditability suitable for government procurement processes.
92
+
93
+ ---
94
+
95
+ ## Human-in-the-Loop Workflow
96
+
97
+ The system incorporates a mandatory human review layer:
98
+ - Low-confidence or conflicting cases are routed to reviewers
99
+ - The interface highlights extracted data directly within documents
100
+ - Reviewers can: Approve, Edit, or Reject decisions
101
+ - All reviewer decisions are captured and used to improve system performance over time
102
+
103
+ ---
104
+
105
+ ## Key Features
106
+
107
+ - Handles scanned and unstructured documents effectively
108
+ - Provides criterion-level explainability for every decision
109
+ - Ensures no silent disqualification of bidders
110
+ - Maintains a fully auditable decision pipeline
111
+ - Scales across departments and tender types
112
+
113
+ ---
114
+
115
+ ## Technology Stack
116
+
117
+ | Layer | Technology |
118
+ |---|---|
119
+ | AI/ML | LLMs for extraction, OCR (Tesseract or PaddleOCR), LayoutLM for document understanding |
120
+ | Backend | Python (FastAPI) with rule-based evaluation engine |
121
+ | Storage | PostgreSQL and vector database for document retrieval |
122
+ | Frontend | React-based dashboard |
123
+
124
+ ---
125
+
126
+ ## Risks and Mitigation
127
+
128
+ | Risk | Mitigation |
129
+ |---|---|
130
+ | OCR inaccuracies | Confidence scoring and human review |
131
+ | Legal language ambiguity | Hybrid LLM and rule-based parsing |
132
+ | Data inconsistency across documents | Conflict detection and validation logic |
133
+ | Over-automation risk | Human-in-the-loop validation |
134
+
135
+ ---
136
+
137
+ ## Why This Solution Stands Out
138
+
139
+ - Balances automation with accountability
140
+ - Designed specifically for government procurement constraints
141
+ - Focuses on trust, explainability, and auditability
142
+ - Works effectively with real-world, messy data formats
143
+
144
+ ---
145
+
146
+ ## Future Scope (Round 2)
147
+
148
+ - Integration with existing procurement systems
149
+ - Model improvement through feedback loops
150
+ - Multi-language document support
151
+ - Advanced fraud detection in bidder submissions
152
+
153
+ ---
154
+
155
+ ## Core Philosophy
156
+
157
+ The system prioritizes **assistive intelligence over full automation**, ensuring that every decision is explainable, reviewable, and compliant with government procurement standards.
specs/00_skeleton.md ADDED
@@ -0,0 +1,594 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec 00 — Project Skeleton
2
+
3
+ **Step:** 1 of 15
4
+ **Time budget:** ~15 min
5
+ **Checkpoint:** `streamlit run app.py` opens in the browser showing 5 named tabs and a sidebar with logo placeholder, project name, and connection status dot. No errors in the terminal.
6
+
7
+ ---
8
+
9
+ ## Goal
10
+
11
+ Create every file and directory that Step 2 onward will write into. All Python modules are stubs (importable but empty of logic). The running app must render without crashing.
12
+
13
+ ---
14
+
15
+ ## Files to Create
16
+
17
+ ### Root-level files
18
+
19
+ #### `requirements.txt`
20
+ ```
21
+ streamlit==1.39.0
22
+ openai==1.51.0
23
+ pymupdf==1.24.10
24
+ pytesseract==0.3.13
25
+ Pillow==10.4.0
26
+ numpy==1.26.4
27
+ chromadb==0.5.5
28
+ sentence-transformers==3.1.1
29
+ pydantic==2.9.2
30
+ python-dotenv==1.0.1
31
+ reportlab==4.2.5
32
+ pandas==2.2.3
33
+ ```
34
+
35
+ #### `packages.txt`
36
+ ```
37
+ tesseract-ocr
38
+ poppler-utils
39
+ ```
40
+
41
+ #### `.env.example`
42
+ ```
43
+ DEEPSEEK_API_KEY=your_key_here
44
+ ```
45
+
46
+ #### `.gitignore`
47
+ ```
48
+ .env
49
+ .chroma/
50
+ audit.db
51
+ __pycache__/
52
+ *.pyc
53
+ .ocr_cache/
54
+ *.egg-info/
55
+ dist/
56
+ build/
57
+ .DS_Store
58
+ Thumbs.db
59
+ ```
60
+
61
+ #### `app.py` — Streamlit entry point (stub)
62
+
63
+ Exact stub content:
64
+
65
+ ```python
66
+ import streamlit as st
67
+
68
+ from ui.tab_overview import render as render_overview
69
+ from ui.tab_tender import render as render_tender
70
+ from ui.tab_bidders import render as render_bidders
71
+ from ui.tab_review import render as render_review
72
+ from ui.tab_audit import render as render_audit
73
+
74
+ st.set_page_config(
75
+ page_title="TenderIQ",
76
+ page_icon="⚖️",
77
+ layout="wide",
78
+ )
79
+
80
+ # ── Sidebar ──────────────────────────────────────────────────────────────────
81
+ with st.sidebar:
82
+ st.markdown("## ⚖️ TenderIQ")
83
+ st.caption("Explainable AI for Tender Evaluation")
84
+ st.divider()
85
+ # Connection status — placeholder until core/llm_client.py is wired
86
+ st.markdown("🔴 **DeepSeek:** not connected")
87
+ st.divider()
88
+ if st.button("Reset Session", use_container_width=True):
89
+ for key in list(st.session_state.keys()):
90
+ del st.session_state[key]
91
+ st.rerun()
92
+
93
+ # ── Tabs ─────────────────────────────────────────────────────────────────────
94
+ tab1, tab2, tab3, tab4, tab5 = st.tabs([
95
+ "Overview",
96
+ "Tender Analysis",
97
+ "Bidder Evaluation",
98
+ "Human Review",
99
+ "Audit Log",
100
+ ])
101
+
102
+ with tab1:
103
+ render_overview()
104
+
105
+ with tab2:
106
+ render_tender()
107
+
108
+ with tab3:
109
+ render_bidders()
110
+
111
+ with tab4:
112
+ render_review()
113
+
114
+ with tab5:
115
+ render_audit()
116
+ ```
117
+
118
+ ---
119
+
120
+ ### `core/` package — all stubs
121
+
122
+ Every file in `core/` must be importable and expose the names that `app.py` or other modules reference at import time. No logic yet — just `pass` stubs and placeholder class/function signatures.
123
+
124
+ #### `core/__init__.py`
125
+ Empty.
126
+
127
+ #### `core/config.py`
128
+ ```python
129
+ import os
130
+ from pathlib import Path
131
+ from dotenv import load_dotenv
132
+
133
+ load_dotenv()
134
+
135
+ DEEPSEEK_API_KEY: str | None = os.getenv("DEEPSEEK_API_KEY")
136
+ DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"
137
+ MODEL_NAME = "deepseek-chat"
138
+ MODEL_VERSION = f"{MODEL_NAME}@2026-05-07"
139
+
140
+ CONFIDENCE_HIGH = 0.80
141
+ CONFIDENCE_REVIEW = 0.55
142
+ OCR_TESSERACT_MIN_CONF = 0.65
143
+
144
+ BASE_DIR = Path(__file__).resolve().parent.parent
145
+ DATA_DIR = BASE_DIR / "data"
146
+ CHROMA_DIR = str(BASE_DIR / ".chroma")
147
+ AUDIT_DB = str(BASE_DIR / "audit.db")
148
+ PRECOMPUTED_DIR = DATA_DIR / "precomputed"
149
+ OCR_CACHE_DIR = BASE_DIR / ".ocr_cache"
150
+ ```
151
+
152
+ #### `core/schemas.py`
153
+ ```python
154
+ from __future__ import annotations
155
+ from typing import Literal, Optional
156
+ from pydantic import BaseModel, Field
157
+ import uuid
158
+
159
+
160
+ class Rule(BaseModel):
161
+ type: Literal["numeric_threshold", "count_threshold", "certification_present", "document_present"]
162
+ field: str
163
+ operator: Literal[">=", "<=", "==", "exists"]
164
+ value: float | int | None = None
165
+ unit: str | None = None
166
+
167
+
168
+ class Criterion(BaseModel):
169
+ id: str
170
+ title: str
171
+ category: Literal["financial", "technical", "compliance"]
172
+ mandatory: bool
173
+ description: str
174
+ rule: Rule
175
+ query_hints: list[str]
176
+ source_page: int
177
+ source_clause: str
178
+
179
+
180
+ class Evidence(BaseModel):
181
+ bidder_id: str
182
+ doc_name: str
183
+ page: int
184
+ text: str
185
+ source_type: Literal["text_pdf", "tesseract", "vision_llm"]
186
+ ocr_confidence: float | None = None
187
+
188
+
189
+ class Source(BaseModel):
190
+ doc_name: str
191
+ page: int
192
+ snippet: str
193
+ source_type: Literal["text_pdf", "tesseract", "vision_llm"]
194
+
195
+
196
+ class Verdict(BaseModel):
197
+ verdict_id: str = Field(default_factory=lambda: f"V-{uuid.uuid4().hex[:8]}")
198
+ bidder_id: str
199
+ criterion_id: str
200
+ verdict: Literal["eligible", "not_eligible", "needs_review"]
201
+ extracted_value: str | None = None
202
+ normalized_value: float | int | None = None
203
+ source: Source | None = None
204
+ llm_confidence: float = 0.0
205
+ ocr_confidence: float | None = None
206
+ combined_confidence: float = 0.0
207
+ reason: str = ""
208
+ model_version: str = ""
209
+ timestamp: str = ""
210
+ review_status: Literal["pending", "approved", "edited", "rejected"] = "pending"
211
+
212
+
213
+ class AuditEntry(BaseModel):
214
+ id: int | None = None
215
+ ts: str
216
+ action: str
217
+ actor: str
218
+ model_version: str | None = None
219
+ bidder_id: str | None = None
220
+ criterion_id: str | None = None
221
+ payload_json: str | None = None
222
+ ```
223
+
224
+ #### `core/prompts.py`
225
+ ```python
226
+ EXTRACT_CRITERIA_PROMPT_SYSTEM = """\
227
+ You are an expert in Indian government tender analysis (CRPF context). Your job is to extract \
228
+ eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria \
229
+ not present in the text. Classify each criterion as mandatory or optional based on cue words: \
230
+ "shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", \
231
+ "may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints \
232
+ that an evaluator would search for in bidder documents.\
233
+ """
234
+
235
+ EVALUATE_CRITERION_PROMPT_SYSTEM = """\
236
+ You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from \
237
+ a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest \
238
+ single source. NEVER guess values not present in the evidence. If evidence is missing or \
239
+ ambiguous, return needs_review with reason. Output STRICT JSON.\
240
+ """
241
+
242
+ VISION_OCR_PROMPT_SYSTEM = """\
243
+ You are an OCR engine for Indian government procurement documents. Transcribe the image text \
244
+ faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use \
245
+ markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — \
246
+ no commentary.\
247
+ """
248
+
249
+ VISION_OCR_USER = (
250
+ "Transcribe this document page completely. Pay special attention to numeric values like "
251
+ "turnover figures (INR / Crore / Lakh), dates, and registration numbers."
252
+ )
253
+ ```
254
+
255
+ #### `core/llm_client.py`
256
+ ```python
257
+ from pathlib import Path
258
+
259
+
260
+ class LLMUnavailable(Exception):
261
+ pass
262
+
263
+
264
+ class LLM:
265
+ def __init__(self, api_key: str | None = None):
266
+ pass
267
+
268
+ def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict:
269
+ raise NotImplementedError
270
+
271
+ def chat_vision(
272
+ self,
273
+ system: str,
274
+ user_text: str,
275
+ image: bytes | str | Path,
276
+ max_retries: int = 2,
277
+ ) -> str:
278
+ raise NotImplementedError
279
+ ```
280
+
281
+ #### `core/pdf_utils.py`
282
+ ```python
283
+ from pathlib import Path
284
+ import PIL.Image
285
+
286
+
287
+ def extract_pages(path: Path) -> list[dict]:
288
+ raise NotImplementedError
289
+
290
+
291
+ def is_text_pdf(path: Path) -> bool:
292
+ raise NotImplementedError
293
+
294
+
295
+ def render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image.Image:
296
+ raise NotImplementedError
297
+ ```
298
+
299
+ #### `core/ocr_pipeline.py`
300
+ ```python
301
+ from pathlib import Path
302
+
303
+
304
+ class ExtractedPage:
305
+ page: int
306
+ text: str
307
+ source_type: str # "text_pdf" | "tesseract" | "vision_llm"
308
+ confidence: float
309
+ raw_tier_results: dict
310
+
311
+
312
+ def extract_document(file_path: Path) -> list[ExtractedPage]:
313
+ raise NotImplementedError
314
+ ```
315
+
316
+ #### `core/chunker.py`
317
+ ```python
318
+ from core.ocr_pipeline import ExtractedPage
319
+
320
+
321
+ def chunk_tender(pages: list[dict], tender_id: str) -> list[dict]:
322
+ raise NotImplementedError
323
+
324
+
325
+ def chunk_bidder(
326
+ pages: list[ExtractedPage], bidder_id: str, doc_name: str
327
+ ) -> list[dict]:
328
+ raise NotImplementedError
329
+ ```
330
+
331
+ #### `core/vectorstore.py`
332
+ ```python
333
+ def get_client():
334
+ raise NotImplementedError
335
+
336
+
337
+ def get_collection(name: str):
338
+ raise NotImplementedError
339
+
340
+
341
+ def add_chunks(collection, chunks: list[dict], metadatas: list[dict]) -> None:
342
+ raise NotImplementedError
343
+
344
+
345
+ def query(
346
+ collection, text: str, k: int = 4, where: dict | None = None
347
+ ) -> list[dict]:
348
+ raise NotImplementedError
349
+ ```
350
+
351
+ #### `core/criteria_extractor.py`
352
+ ```python
353
+ from pathlib import Path
354
+ from core.schemas import Criterion
355
+
356
+
357
+ def extract_criteria(tender_pdf_path: Path) -> list[Criterion]:
358
+ raise NotImplementedError
359
+ ```
360
+
361
+ #### `core/bidder_processor.py`
362
+ ```python
363
+ from pathlib import Path
364
+ from core.schemas import Criterion, Evidence
365
+
366
+
367
+ def process_bidder(bidder_id: str, files: list[Path]) -> None:
368
+ raise NotImplementedError
369
+
370
+
371
+ def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
372
+ raise NotImplementedError
373
+ ```
374
+
375
+ #### `core/evaluator.py`
376
+ ```python
377
+ from core.schemas import Criterion, Verdict
378
+
379
+
380
+ def evaluate(bidder_id: str, criterion: Criterion) -> Verdict:
381
+ raise NotImplementedError
382
+
383
+
384
+ def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]:
385
+ raise NotImplementedError
386
+ ```
387
+
388
+ #### `core/audit.py`
389
+ ```python
390
+ def log(action: str, actor: str = "system", **fields) -> int:
391
+ raise NotImplementedError
392
+
393
+
394
+ def query(filters: dict | None = None) -> list[dict]:
395
+ raise NotImplementedError
396
+ ```
397
+
398
+ #### `core/fallback.py`
399
+ ```python
400
+ from core.schemas import Criterion, Verdict
401
+
402
+
403
+ def load_criteria() -> list[Criterion]:
404
+ raise NotImplementedError
405
+
406
+
407
+ def load_evaluation(bidder_id: str, criterion_id: str) -> Verdict:
408
+ raise NotImplementedError
409
+ ```
410
+
411
+ ---
412
+
413
+ ### `ui/` package — all stubs
414
+
415
+ Each tab module exports a single `render()` function that renders a placeholder heading. No logic.
416
+
417
+ #### `ui/__init__.py`
418
+ Empty.
419
+
420
+ #### `ui/tab_overview.py`
421
+ ```python
422
+ import streamlit as st
423
+
424
+ def render() -> None:
425
+ st.header("Overview")
426
+ st.info("Coming soon — architecture diagram, KPIs, and demo CTA.")
427
+ ```
428
+
429
+ #### `ui/tab_tender.py`
430
+ ```python
431
+ import streamlit as st
432
+
433
+ def render() -> None:
434
+ st.header("Tender Analysis")
435
+ st.info("Coming soon — upload tender and extract eligibility criteria.")
436
+ ```
437
+
438
+ #### `ui/tab_bidders.py`
439
+ ```python
440
+ import streamlit as st
441
+
442
+ def render() -> None:
443
+ st.header("Bidder Evaluation")
444
+ st.info("Coming soon — per-bidder, per-criterion verdict table.")
445
+ ```
446
+
447
+ #### `ui/tab_review.py`
448
+ ```python
449
+ import streamlit as st
450
+
451
+ def render() -> None:
452
+ st.header("Human Review Queue")
453
+ st.info("Coming soon — approve / edit / reject flagged verdicts.")
454
+ ```
455
+
456
+ #### `ui/tab_audit.py`
457
+ ```python
458
+ import streamlit as st
459
+
460
+ def render() -> None:
461
+ st.header("Audit Log")
462
+ st.info("Coming soon — sortable audit log with CSV export.")
463
+ ```
464
+
465
+ #### `ui/components.py`
466
+ ```python
467
+ # Shared UI widgets — implemented incrementally as Tab 3 and Tab 4 need them.
468
+ ```
469
+
470
+ ---
471
+
472
+ ### `data/` directory structure (empty folders only)
473
+
474
+ ```
475
+ data/
476
+ tender/
477
+ bidders/
478
+ bidder_a/
479
+ bidder_b/
480
+ bidder_c/
481
+ precomputed/
482
+ ```
483
+
484
+ No files yet — Step 2 (mock data generation) populates these.
485
+
486
+ ---
487
+
488
+ ### `scripts/` directory (empty stubs)
489
+
490
+ #### `scripts/generate_mock_data.py`
491
+ ```python
492
+ """Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
493
+ ```
494
+
495
+ #### `scripts/precompute_results.py`
496
+ ```python
497
+ """Step 11 — runs the full pipeline and writes data/precomputed/*.json."""
498
+ ```
499
+
500
+ #### `scripts/smoke_test.py`
501
+ ```python
502
+ """Step 13 — programmatic end-to-end check; exits 0 on success."""
503
+ ```
504
+
505
+ ---
506
+
507
+ ### `assets/` directory (empty, for later)
508
+
509
+ ```
510
+ assets/
511
+ screenshots/
512
+ ```
513
+
514
+ ---
515
+
516
+ ### `deck/` directory (empty, for later)
517
+
518
+ ```
519
+ deck/
520
+ ```
521
+
522
+ ---
523
+
524
+ ## Directory Tree After This Step
525
+
526
+ ```
527
+ TenderIQ/
528
+ ├── app.py
529
+ ├── requirements.txt
530
+ ├── packages.txt
531
+ ├── .env.example
532
+ ├── .gitignore
533
+ ├── specs/
534
+ │ └── 00_skeleton.md ← this file
535
+ ├── core/
536
+ │ ├── __init__.py
537
+ │ ├── config.py
538
+ │ ├── schemas.py
539
+ │ ├── prompts.py
540
+ │ ├── llm_client.py
541
+ │ ├── pdf_utils.py
542
+ │ ├── ocr_pipeline.py
543
+ │ ├── chunker.py
544
+ │ ├── vectorstore.py
545
+ │ ├── criteria_extractor.py
546
+ │ ├── bidder_processor.py
547
+ │ ├── evaluator.py
548
+ │ ├── audit.py
549
+ │ └── fallback.py
550
+ ├── ui/
551
+ │ ├── __init__.py
552
+ │ ├── tab_overview.py
553
+ │ ├── tab_tender.py
554
+ │ ├── tab_bidders.py
555
+ │ ├── tab_review.py
556
+ │ ├── tab_audit.py
557
+ │ └── components.py
558
+ ├── data/
559
+ │ ├── tender/
560
+ │ ├── bidders/
561
+ │ │ ├── bidder_a/
562
+ │ │ ├── bidder_b/
563
+ │ │ └── bidder_c/
564
+ │ └── precomputed/
565
+ ├── scripts/
566
+ │ ├── generate_mock_data.py
567
+ │ ├── precompute_results.py
568
+ │ └── smoke_test.py
569
+ ├── assets/
570
+ │ └── screenshots/
571
+ └── deck/
572
+ ```
573
+
574
+ Runtime artifacts (gitignored, not created here): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`.
575
+
576
+ ---
577
+
578
+ ## Acceptance Criteria
579
+
580
+ 1. `python -c "import app"` executes without `ImportError` (all stubs importable).
581
+ 2. `streamlit run app.py` opens in the browser without a Python traceback.
582
+ 3. Five tabs are visible: Overview, Tender Analysis, Bidder Evaluation, Human Review, Audit Log.
583
+ 4. Sidebar shows "⚖️ TenderIQ", a caption, a red connection dot placeholder, and a "Reset Session" button.
584
+ 5. Each tab body shows an `st.info(...)` placeholder — no blank white screens.
585
+ 6. `python -c "from core import config, schemas, prompts"` runs without error.
586
+
587
+ ---
588
+
589
+ ## What This Step Does NOT Do
590
+
591
+ - No logic implemented in any `core/` module.
592
+ - No Streamlit secrets or `.env` required to pass the checkpoint.
593
+ - No data files generated (Step 2 does that).
594
+ - No pip install triggered (assumed the environment is set up separately).
submission_requirements.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prototype Phase — Submission Requirements
2
+
3
+ > This is the submission form for Round 2 (Prototype Phase). The idea was already shortlisted in Round 1.
4
+
5
+ ---
6
+
7
+ ## Required Fields
8
+
9
+ | Field | Notes |
10
+ |---|---|
11
+ | **Title** | Clear, descriptive title |
12
+ | **Description** | Project description with formatting and links allowed |
13
+ | **Parent Submission** | Link to the shortlisted Round 1 idea submission |
14
+ | **Theme** | Theme 3: AI-Based Tender Evaluation and Eligibility Analysis |
15
+ | **Snapshots** | Images of the project (JPG/JPEG/PNG, up to 3MB each) |
16
+ | **Video URL** | Demo or pitch video link |
17
+ | **Presentation** | Pitch deck or slides (.key, .odp, .odt, .pdf, .pps, .ppt, .pptx — max 50MB) |
18
+ | **Demo Link** | Link to working demo or prototype |
19
+ | **Repository URL** | GitHub, Bitbucket, or similar code repository |
20
+ | **Source Code** | Zip or APK upload (max 50MB) |
21
+ | **Instructions to Run** | Step-by-step setup and run instructions for reviewers |
22
+ | **Custom Attachment** | Any additional file — PDF, images, spreadsheets (max 50MB) |
23
+
24
+ ---
25
+
26
+ ## Notes
27
+
28
+ - The "Parent Submission" field links this prototype to the previously shortlisted idea.
29
+ - "Which shortlisted idea are you submitting this prototype for?" — confirms the link to the Round 1 submission.
theme.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Theme 3: AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
2
+
3
+ ## Context
4
+
5
+ Government organisations such as the Central Reserve Police Force (CRPF) issue tenders to procure goods and services. Each tender specifies detailed requirements: technical specifications, financial thresholds, compliance rules, eligibility conditions, document checklists and mandatory certifications. These requirements are typically written in formal, legally careful language and are spread across many pages of the tender document.
6
+
7
+ Private companies respond with bids, each submitting their own set of supporting documents — company profiles, financial statements, experience letters, tax registrations, certifications and more. The documents arrive in many formats: structured text PDFs, scanned copies, Word files, tables and even photographs of physical certificates. The same kind of information is presented in many different ways across bidders.
8
+
9
+ Evaluating whether each bidder meets the stated eligibility criteria is currently a manual process. It is slow, inconsistent across evaluators, prone to oversight, and hard to audit. For a single tender, a committee may spend days cross-checking hundreds of pages against a list of criteria, and two evaluators may reach different conclusions from the same set of documents. There is a clear opportunity to bring modern AI techniques to this problem — to extract structured information from unstructured tender and bid documents, apply the eligibility rules consistently, and produce explainable evaluation reports that a human officer can trust and sign off on.
10
+
11
+ ---
12
+
13
+ ## The Problem
14
+
15
+ Design a technical platform that, given a tender document and a set of bidder submissions, can do the following:
16
+
17
+ ### Understand the Tender
18
+ - Extract the eligibility criteria from the tender document — technical specifications, financial thresholds, compliance conditions, and document and certification requirements.
19
+ - Distinguish between mandatory and optional criteria.
20
+ - Capture each criterion in a form that can be matched against a bidder's submission.
21
+
22
+ ### Understand Each Bidder
23
+ - Parse every bidder submission, regardless of whether the documents are typed PDFs, scanned copies, Word files or photographs.
24
+ - Extract the values and evidence relevant to each criterion from those documents.
25
+ - Handle variation in how bidders present the same information.
26
+
27
+ ### Evaluate and Explain
28
+ - For each bidder, decide whether they are **Eligible**, **Not Eligible**, or **Need Manual Review** against each criterion and overall.
29
+ - Produce an explanation for every verdict that references the specific criterion, the specific document and the specific value that drove the decision.
30
+ - Surface ambiguous or uncertain cases for human review rather than silently disqualifying them.
31
+ - Produce a consolidated evaluation report that a procurement officer can use as the basis for a decision.
32
+
33
+ ---
34
+
35
+ ## Non-Negotiables
36
+
37
+ - Every verdict must be explainable at the criterion level — which criterion was being checked, which document was used, what value was found, and why the bidder passed, failed or needs review.
38
+ - The system must **never silently disqualify** a bidder. Ambiguous or uncertain cases must be surfaced for human review with the reason.
39
+ - The system must handle scanned documents and photographs, not only digital text.
40
+ - The system must be auditable end-to-end and suitable for use in a formal government procurement decision.
41
+ - Real tender and bid data will not be released for Round 1. Any Round 2 implementation will run on representative mock or redacted documents inside a sandbox.
42
+
43
+ ---
44
+
45
+ ## What Success Looks Like
46
+
47
+ A working solution should eventually make the following behaviours possible:
48
+
49
+ 1. A procurement officer uploads a tender document and a set of bidder submissions. The system extracts the eligibility criteria automatically and lists them for review.
50
+ 2. For each bidder, the system produces a criterion-by-criterion evaluation with references back to the source documents.
51
+ 3. Clearly eligible and clearly ineligible bidders are marked as such; genuinely ambiguous cases are flagged for manual review with the reason for the ambiguity.
52
+ 4. A consolidated report can be exported and signed off, with a complete audit trail of every automated decision.
53
+
54
+ ---
55
+
56
+ ## Sample Scenario
57
+
58
+ A government department issues a tender for construction services with the following eligibility criteria: a minimum annual turnover of ₹5 crore, at least 3 similar projects completed in the last 5 years, a valid GST registration, and an ISO 9001 certification. Ten bidders submit responses, each with their own combination of typed and scanned supporting documents.
59
+
60
+ A good solution would extract these four criteria from the tender, parse each bidder's submission, and produce a report:
61
+ - 6 bidders clearly eligible with evidence for each criterion
62
+ - 3 clearly ineligible with the specific criterion they failed and the document that showed it
63
+ - 1 flagged for manual review because the turnover document is a scanned certificate with figures that could not be read with confidence
64
+
65
+ ---
66
+
67
+ ## What Your Solution Should Cover
68
+
69
+ Round 1 of this hackathon is a **written solution submission**. Your solution document should make clear how you would build this platform. At minimum, it should cover:
70
+
71
+ 1. Your understanding of the problem and the realities of government procurement, in your own words.
72
+ 2. Your approach to extracting eligibility criteria from a tender document, including how you separate technical, financial and compliance conditions, and how you distinguish mandatory from optional criteria.
73
+ 3. Your approach to parsing bidder submissions with heterogeneous document types — typed PDFs, scanned documents, tables, photographs — and extracting the values that map to each criterion.
74
+ 4. How you match extracted bidder information against the criteria, and how you handle ambiguity, partial information and variation in legal and technical language.
75
+ 5. How the system produces explainable, criterion-level verdicts, and how ambiguous cases are surfaced for human review instead of being silently rejected.
76
+ 6. How you would guarantee the auditability of every decision, suitable for a formal government procurement context.
77
+ 7. A clear architecture overview, the key technology and model choices you would make, and the reasons behind them.
78
+ 8. The main risks and trade-offs you see, and how you would handle them.
79
+ 9. A rough implementation plan for Round 2, assuming a sandbox with sample tender and bidder documents is provided.
80
+
81
+ ---
82
+
83
+ ## How We Will Evaluate Proposals
84
+
85
+ - Clarity of problem understanding — does the team show they have grasped the realities of government procurement, not just the surface problem?
86
+ - Technical soundness of the proposed approach, including document understanding, criterion matching and explainability.
87
+ - Depth of thinking on edge cases: scanned documents, photographs, ambiguous language, partial information and format inconsistency.
88
+ - Design of the human-in-the-loop path for ambiguous cases, and of the audit trail.
89
+ - Quality of the architecture, the justification of technology and model choices, and the identified risks and trade-offs.
understanding.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TenderIQ — Project Understanding
2
+
3
+ ---
4
+
5
+ ## Where We Are
6
+
7
+ The idea phase (Round 1) is **done and shortlisted**. The `idea.md` was the written submission. We are now in the **Prototype Phase (Round 2)**, which requires a working prototype, demo, code repository, pitch deck, and video.
8
+
9
+ ---
10
+
11
+ ## The Problem (from CRPF's perspective)
12
+
13
+ CRPF issues tenders. Companies bid. Someone has to manually read:
14
+ - The tender document (criteria, thresholds, compliance rules)
15
+ - Every bidder's stack of supporting documents (PDFs, scans, photos, Word files)
16
+
17
+ ...and verify that each bidder meets each criterion. For one tender, this takes a committee days. Two evaluators may reach different conclusions from the same documents. There's no consistent audit trail.
18
+
19
+ **The core pain points:**
20
+ 1. Manual, slow, expensive
21
+ 2. Inconsistent across evaluators
22
+ 3. Not auditable / not transparent
23
+ 4. Documents arrive in messy formats (scanned, photographed, mixed)
24
+
25
+ ---
26
+
27
+ ## What TenderIQ Does
28
+
29
+ A four-stage AI pipeline:
30
+
31
+ ```
32
+ Tender Document ──► [Stage 1] Criteria Extraction
33
+
34
+
35
+ Bidder Documents ──► [Stage 2] Document Processing (OCR + entity extraction)
36
+
37
+
38
+ [Stage 3] Evaluation Engine (rule-based + confidence)
39
+
40
+
41
+ [Stage 4] Explainability + Audit Layer
42
+
43
+ ┌─────────┴──────────┐
44
+ ▼ ▼
45
+ Auto-decision Human Review Queue
46
+ (Eligible / Not Eligible) (Needs Manual Review)
47
+ ```
48
+
49
+ ### Stage 1 — Tender Understanding
50
+ - LLM + rule-based hybrid extracts criteria from tender doc
51
+ - Classifies each as mandatory or optional
52
+ - Outputs structured, machine-readable criteria list
53
+
54
+ ### Stage 2 — Bidder Document Processing
55
+ - Handles: typed PDFs, scanned docs, images, Word files
56
+ - OCR for non-digital content
57
+ - Layout-aware parsing (tables, forms, certificates)
58
+ - Entity extraction: turnover figures, cert names, project counts
59
+ - Every extracted value tagged with: source doc, page number, confidence score
60
+
61
+ ### Stage 3 — Evaluation Engine
62
+ - Criterion-by-criterion comparison per bidder
63
+ - Rule-based validation (threshold checks)
64
+ - Confidence-aware: low confidence → "Needs Manual Review", not auto-reject
65
+ - Three outcomes: Eligible / Not Eligible / Needs Manual Review
66
+
67
+ ### Stage 4 — Explainability + Audit
68
+ - Every decision has: criterion checked, value found, source doc, confidence, reason
69
+ - Full audit log: model version, timestamp, reviewer actions
70
+ - Human reviewers can approve / edit / reject flagged cases
71
+ - Reviewer decisions feed back into system improvement
72
+
73
+ ---
74
+
75
+ ## Non-Negotiables (from theme)
76
+
77
+ These are hard constraints, not nice-to-haves:
78
+
79
+ | Constraint | Implication for build |
80
+ |---|---|
81
+ | Every verdict must be explainable at criterion level | No black-box scoring; each criterion decision must be traceable |
82
+ | Never silently disqualify | Low confidence = human review queue, not auto-reject |
83
+ | Must handle scanned docs and photographs | OCR is not optional |
84
+ | End-to-end auditable | Every system action must be logged with immutable records |
85
+
86
+ ---
87
+
88
+ ## What We Need to Deliver (Prototype Phase)
89
+
90
+ | Deliverable | What it means |
91
+ |---|---|
92
+ | Working demo | The pipeline must actually run on mock/sample data |
93
+ | Demo link | Hosted or accessible prototype |
94
+ | Repo URL | Clean, documented code |
95
+ | Source code zip | Packaged for reviewers to run |
96
+ | Run instructions | Step-by-step so reviewers can test it |
97
+ | Presentation | Pitch deck covering the full solution |
98
+ | Video | Demo + pitch walkthrough |
99
+ | Snapshots | Screenshots of the UI/output |
100
+ | Description | Written summary of the project |
101
+
102
+ ---
103
+
104
+ ## Proposed Tech Stack (from idea)
105
+
106
+ | Component | Technology | Why |
107
+ |---|---|---|
108
+ | LLM for criteria extraction | LLM (e.g., Claude, GPT-4, or open-source) | Handles legal language, ambiguity |
109
+ | OCR | Tesseract or PaddleOCR | Open-source, handles scanned docs and images |
110
+ | Document layout understanding | LayoutLM | Understands tables, forms, structured layouts |
111
+ | Backend | Python + FastAPI | Fast to build, good ML ecosystem |
112
+ | Database | PostgreSQL + vector DB | Structured storage + semantic search |
113
+ | Frontend | React | Dashboard for review, reporting |
114
+
115
+ ---
116
+
117
+ ## Key Design Decisions to Think About
118
+
119
+ ### 1. Hybrid extraction (LLM + rules)
120
+ - Pure LLM: flexible but unpredictable on numeric thresholds
121
+ - Pure rules: precise but brittle on varied language
122
+ - Hybrid: LLM for interpretation, rules for validation — best of both
123
+
124
+ ### 2. Confidence threshold design
125
+ - What confidence score triggers "Needs Manual Review"?
126
+ - This is a calibration problem — too low a threshold floods reviewers, too high risks bad auto-decisions
127
+
128
+ ### 3. Vector DB role
129
+ - Enables semantic search over extracted bidder data
130
+ - Useful when a criterion mentions "similar projects" and you need to match against descriptions
131
+
132
+ ### 4. Audit log immutability
133
+ - Government procurement context requires tamper-evident logs
134
+ - Must capture: what AI decided, why, when, which model version, and what the human reviewer did
135
+
136
+ ---
137
+
138
+ ## Gaps / Things Not Yet Defined
139
+
140
+ - **Which LLM?** The idea says "LLMs" but doesn't specify. For a prototype, this matters.
141
+ - **Which vector DB?** Pinecone, Weaviate, ChromaDB, pgvector — not chosen yet.
142
+ - **Criteria schema** — what does the structured criterion object look like exactly?
143
+ - **Confidence score methodology** — how is it calculated and what thresholds are used?
144
+ - **UI scope** — how much of the review interface needs to be built for the prototype?
145
+ - **Mock data** — we need sample tender docs and bidder submissions to demo against.
146
+ - **Evaluation report format** — what does the exported report look like?
147
+
148
+ ---
149
+
150
+ ## Summary
151
+
152
+ The idea is solid and already shortlisted. The core insight is: **don't try to fully automate procurement decisions; build a system that makes human reviewers dramatically faster and more consistent, with a complete audit trail.** The prototype needs to demonstrate this pipeline end-to-end on mock data, with a UI that shows criterion-level explanations.
153
+
154
+ Next step: define the implementation plan — what to build, in what order, and what scope is realistic for the prototype.