Spaces:

JaydeepR
/

TenderIQ

Sleeping

JaydeepR Claude Sonnet 4.6 commited on 14 days ago

Commit

5275508

1 Parent(s): 1b26bd8

Remove internal planning docs from repo, gitignore them

Keep README.md and ARCHITECTURE.md (public-facing).
Remove from tracking: idea.md, IMPLEMENTATION_PLAN.md,
presentation_creation.md, submission_requirements.md,
theme.md, understanding.md, specs/ — files remain locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (19) hide show

.gitignore +9 -0
IMPLEMENTATION_PLAN.md +0 -700
idea.md +0 -157
presentation_creation.md +0 -689
specs/00_skeleton.md +0 -594
specs/01_config_and_schemas.md +0 -145
specs/02_llm_client.md +0 -101
specs/03_pdf_utils_and_chunker.md +0 -80
specs/04_ocr_pipeline.md +0 -97
specs/06_vectorstore_and_bidder_processor.md +0 -97
specs/07_criteria_extractor.md +0 -79
specs/09_evaluator.md +0 -134
specs/10_audit_and_fallback.md +0 -83
specs/11_mock_data.md +0 -211
specs/12_precompute.md +0 -73
specs/13_ui_tabs.md +0 -121
submission_requirements.md +0 -29
theme.md +0 -89
understanding.md +0 -154

.gitignore CHANGED Viewed

@@ -39,3 +39,12 @@ Thumbs.db
 # Generated presentations (keep locally, don't track in git)
 deck/*.pptx
 deck/*.pdf

 # Generated presentations (keep locally, don't track in git)
 deck/*.pptx
 deck/*.pdf
+# Internal planning / session docs — not for the repo
+idea.md
+IMPLEMENTATION_PLAN.md
+presentation_creation.md
+submission_requirements.md
+theme.md
+understanding.md
+specs/

IMPLEMENTATION_PLAN.md DELETED Viewed

@@ -1,700 +0,0 @@
-# TenderIQ — Implementation Plan
-> **For:** any contributor or fresh AI context picking up this project.
-> **You do not need any prior conversation context to use this document.**
----
-## 0. How To Use This Plan
-This project follows **spec-driven development**:
-1. **This document** is the master implementation plan. It defines architecture, modules, schemas, and the build order. It does **not** contain final source code.
-2. For **each module or coherent unit of work** listed in this plan, the team will produce a **spec document** (a short markdown file) before writing code. Each spec covers: inputs, outputs, function signatures, error cases, dependencies, and acceptance criteria.
-3. Code is written **only against an approved spec**, not directly from this plan.
-4. Specs live in `specs/` (e.g. `specs/01_llm_client.md`, `specs/02_ocr_pipeline.md`). One spec per module. Number prefixes follow the build order in section 9.
-5. Once a spec is implemented, the spec file is preserved alongside the code as documentation.
-**Sequencing rule:** never skip the spec step. If you find yourself wanting to "just code it," stop and write the spec first — it forces precision and exposes hidden assumptions.
----
-## 1. Background
-### What TenderIQ is
-TenderIQ is an AI-powered platform that automates eligibility evaluation of bidders against government tender criteria. It is being built for the **Central Reserve Police Force (CRPF) hackathon, Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement**.
-### Why it exists
-Government procurement officers today manually read tender documents (criteria, thresholds, compliance requirements) and bidder submissions (financial statements, certifications, project records — often in mixed formats including scans and photos), and decide whether each bidder meets each criterion. For one tender, a committee may spend days; two evaluators routinely reach different conclusions on the same documents; there is no consistent audit trail.
-TenderIQ does this evaluation automatically while preserving human oversight: extract criteria from the tender, parse bidder documents, evaluate criterion-by-criterion with confidence scoring, surface ambiguous cases for human review, and emit a complete audit log.
-### Where this project sits in the hackathon
-- **Round 1 (Idea Phase)**: written submission — already shortlisted. See `idea.md`.
-- **Round 2 (Prototype Phase)**: working prototype — this is what we are building. Submission requirements are in `submission_requirements.md`.
-### Source documents in this repository
-| File | Purpose |
-|---|---|
-| `theme.md` | Original problem statement from CRPF (the "why" and the hard constraints) |
-| `idea.md` | The shortlisted Round 1 written submission (the "what") |
-| `understanding.md` | Synthesized understanding of the problem space |
-| `submission_requirements.md` | Form fields required for the Round 2 submission |
-| `IMPLEMENTATION_PLAN.md` | **This file** — the build plan |
-| `specs/` | Per-module spec documents (created during build, one per module) |
-Read those four documents (theme, idea, understanding, submission requirements) before drafting the first spec.
----
-## 2. Hard Constraints (from the theme — non-negotiable)
-These are evaluator-facing requirements. Every architectural decision must respect them.
-1. **Every verdict must be explainable at criterion level** — for each (bidder, criterion) pair the system must show: which criterion was checked, which document and page provided the evidence, what value was extracted, what confidence the system had, and why the verdict was assigned.
-2. **Never silently disqualify** — low-confidence or ambiguous cases must be routed to a human review queue with a stated reason, never auto-rejected.
-3. **Must handle scanned documents and photographs** — OCR is mandatory. The system cannot assume digital text.
-4. **End-to-end auditable** — every action (criterion extraction, evaluation, OCR fallback invocation, human review action) must be logged with timestamp, model version, actor, and payload.
-A submission that fails any of these is unlikely to score well. Treat them as acceptance criteria for the system as a whole.
----
-## 3. Operating Constraints (this build)
-- **Time budget:** ~6 hours total — ~5h build + ~1.5h deck/video/screenshots/submission. Do not exceed scope. Compression strategy is documented in section 11.
-- **Platform:** Windows 11 development machine. Streamlit Cloud for hosted demo.
-- **Language:** Python 3.10+.
-- **Starting point:** the project is empty except for the source documents listed in section 1. Everything below is to be created.
-- **API access:** the developer has a **DeepSeek API key**. No other LLM/vision API keys are assumed available.
-- **Storage:** file-based only. SQLite for the audit log; ChromaDB persistent client for vectors. No external services beyond the DeepSeek API and Streamlit Cloud.
-- **Auth/multi-user:** out of scope. A single hardcoded "officer" identity is used in audit entries.
----
-## 4. Confirmed Architectural Decisions
-These were the result of explicit trade-off discussions before the plan was written. Do not relitigate without strong reason.
-### 4.1 UI / Backend
-**Single Streamlit app** (`streamlit==1.39.0`). No separate frontend, no FastAPI service. Streamlit handles UI and orchestration. Deployable free to Streamlit Community Cloud, which satisfies the "Demo Link" submission requirement.
-### 4.2 LLM
-**DeepSeek API**, model `deepseek-v4-pro`, called via the **OpenAI Python SDK** with `base_url="https://api.deepseek.com/v1"` (DeepSeek is OpenAI-compatible). DeepSeek V4-Pro is multimodal — it accepts image inputs, which we exploit for vision-OCR (section 4.4).
-### 4.3 Live-first LLM with cached fallback
-The app **always attempts a live LLM call first**. On any `LLMUnavailable` exception (rate limit, network error, malformed JSON after retries, missing key), it **silently falls back** to pre-computed JSON shipped with the repo (`data/precomputed/*.json`). When fallback fires, a banner is shown and an audit entry is written. This means: judges see real AI executing during their evaluation; the demo still works if the API is down or the key is missing.
-### 4.4 OCR — three-tier pipeline (the robustness centerpiece)
-Bidder documents arrive in mixed formats (typed PDFs, scanned PDFs, photographs of certificates). The OCR pipeline handles each in increasing order of cost:
-| Tier | Engine | When it runs | Cost |
-|---|---|---|---|
-| 1 | PyMuPDF text extraction | Document is a typed PDF (detected via `is_text_pdf` heuristic) | Free, instant |
-| 2 | Tesseract (`pytesseract` + system binary) | Document is a scanned PDF or image | Free, fast, accuracy varies |
-| 3 | DeepSeek Vision LLM | Tesseract `mean_conf < 0.65` or extracted text suspiciously short | API call, slow, very accurate |
-Each extracted page records which tier produced it, and that provenance is shown in the UI ("Read by Tesseract @ 58% → re-read by Vision-LLM @ 95%"). This is more robust than single-engine OCR and is a real production pattern.
-### 4.5 Vector store
-**ChromaDB** persistent client, embedded in-process, file-backed under `.chroma/`. Default embedding model is `all-MiniLM-L6-v2` from `sentence-transformers` (~80MB, downloaded on first run). Two collections: `tender_chunks`, `bidder_chunks` (filterable by `bidder_id`).
-### 4.6 Audit log
-**SQLite** single-file DB (`audit.db`) with one append-only table `audit_log`.
-### 4.7 Things explicitly cut
-- **LayoutLM** — too heavy for the build window. Robustness comes from the 3-tier OCR (vision LLM tier handles documents LayoutLM would otherwise cover).
-- **easyocr** — would add ~1GB (PyTorch). Vision-LLM tier replaces it.
-- **PostgreSQL** — SQLite is sufficient.
-- **React / Next.js / FastAPI split** — Streamlit alone meets all UI needs.
-- **Authentication / multi-user** — single hardcoded officer identity.
-- **Test infrastructure beyond a smoke test** — explicit time-budget decision.
-- **Map-reduce LLM extraction** — mock tender is ~5 pages, fits comfortably in V4's 1M context window in a single call.
----
-## 5. Project Structure
-```
-TenderIQ/
-├── app.py                              # Streamlit entry point, tabs router
-├── requirements.txt                    # pinned pip deps (section 12)
-├── packages.txt                        # apt packages for Streamlit Cloud
-├── .env.example                        # DEEPSEEK_API_KEY=
-├── .gitignore                          # .env, .chroma/, audit.db, __pycache__, .ocr_cache/
-├── README.md                           # run instructions (local + cloud)
-├── ARCHITECTURE.md                     # diagram + flow (used as Custom Attachment)
-├── IMPLEMENTATION_PLAN.md              # this file
-│
-├── specs/                              # per-module specs (created during build)
-│   ├── 01_config_and_schemas.md
-│   ├── 02_llm_client.md
-│   ├── 03_pdf_utils.md
-│   ├── 04_ocr_pipeline.md
-│   ├── 05_chunker.md
-│   ├── 06_vectorstore.md
-│   ├── 07_criteria_extractor.md
-│   ├── 08_bidder_processor.md
-│   ├── 09_evaluator.md
-│   ├── 10_audit_and_fallback.md
-│   ├── 11_mock_data.md
-│   ├── 12_precompute.md
-│   └── 13_ui_tabs.md
-│
-├── core/
-│   ├── __init__.py
-│   ├── config.py                       # env loading, model name, thresholds, paths
-│   ├── schemas.py                      # pydantic: Criterion, Evidence, Verdict, AuditEntry
-│   ├── prompts.py                      # EXTRACT_CRITERIA_PROMPT, EVALUATE_CRITERION_PROMPT, VISION_OCR_PROMPT
-│   ├── llm_client.py                   # DeepSeek wrapper: chat_json, chat_vision, LLMUnavailable
-│   ├── pdf_utils.py                    # PyMuPDF: extract_pages, is_text_pdf, render_page_to_image
-│   ├── ocr_pipeline.py                 # 3-tier OCR orchestrator
-│   ├── chunker.py                      # tender + bidder docs → chunks with metadata
-│   ├── vectorstore.py                  # ChromaDB persistent client + helpers
-│   ├── criteria_extractor.py           # Stage 1: tender PDF → List[Criterion]
-│   ├── bidder_processor.py             # Stage 2: bidder docs → indexed chunks + evidence retrieval
-│   ├── evaluator.py                    # Stage 3: per-criterion verdict with combined confidence
-│   ├── audit.py                        # SQLite audit log writer/reader
-│   └── fallback.py                     # load pre-computed JSON when live LLM fails
-│
-├── ui/
-│   ├── __init__.py
-│   ├── tab_overview.py                 # hero, architecture image, KPIs
-│   ├── tab_tender.py                   # upload tender → show criteria
-│   ├── tab_bidders.py                  # bidder evaluation table with verdicts + sources
-│   ├── tab_review.py                   # human review queue (Approve / Edit / Reject)
-│   ├── tab_audit.py                    # audit log table + CSV export
-│   └── components.py                   # verdict pill, confidence bar, citation chip, OCR-tier badge
-│
-├── data/
-│   ├── tender/
-│   │   └── crpf_construction_tender.pdf
-│   ├── bidders/
-│   │   ├── bidder_a/                   # all eligible — typed PDFs
-│   │   ├── bidder_b/                   # ineligible — turnover too low
-│   │   └── bidder_c/                   # needs review — scanned turnover cert
-│   │       └── turnover_certificate_scan.png
-│   └── precomputed/                    # fallback if live API fails
-│       ├── criteria.json
-│       ├── eval_bidder_a.json
-│       ├── eval_bidder_b.json
-│       └── eval_bidder_c.json
-│
-├── scripts/
-│   ├── generate_mock_data.py           # reportlab → PDFs + PIL/numpy → noisy scan
-│   ├── precompute_results.py           # run pipeline once, save fallback JSON
-│   └── smoke_test.py                   # programmatic end-to-end check
-│
-├── assets/
-│   ├── logo.png
-│   ├── architecture.png                # for deck + Custom Attachment
-│   └── screenshots/                    # 3-5 PNGs for submission
-│
-└── deck/
-    └── TenderIQ_Pitch.pdf              # 8-slide pitch deck
-```
-Runtime artifacts (gitignored): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`, `__pycache__/`.
----
-## 6. Module Responsibilities
-This is the contract surface for each module. Each one will get its own spec document; the descriptions here are the seed material for those specs.
-### `core/config.py`
-- Load `DEEPSEEK_API_KEY` from `st.secrets` first, then `.env` via `python-dotenv`.
-- Constants:
-  - `DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"`
-  - `MODEL_NAME = "deepseek-v4-pro"`
-  - `MODEL_VERSION = "deepseek-v4-pro@<build-date>"` — used for audit stamping
-  - `CONFIDENCE_HIGH = 0.80`
-  - `CONFIDENCE_REVIEW = 0.55`
-  - `OCR_TESSERACT_MIN_CONF = 0.65`
-- Paths: `DATA_DIR`, `CHROMA_DIR = ".chroma"`, `AUDIT_DB = "audit.db"`, `PRECOMPUTED_DIR`, `OCR_CACHE_DIR = ".ocr_cache"`.
-### `core/schemas.py`
-Pydantic models matching the JSON shapes in section 7. At minimum: `Criterion`, `Rule`, `Evidence`, `Source`, `Verdict`, `AuditEntry`.
-### `core/prompts.py`
-Three string constants — see section 8.
-### `core/llm_client.py`
-```
-class LLMUnavailable(Exception): ...
-class LLM:
-    def __init__(self, api_key: str | None = None): ...
-    def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict: ...
-    def chat_vision(self, system: str, user_text: str, image: bytes | str | Path,
-                    max_retries: int = 2) -> str: ...
-```
-- `chat_json` uses `response_format={"type": "json_object"}`, `temperature=0`, retries on JSON parse errors and 5xx with exponential backoff. Raises `LLMUnavailable` after `max_retries`.
-- `chat_vision` encodes the image as `data:image/png;base64,...` and sends a multimodal message in OpenAI-compatible format (`{"type": "image_url", "image_url": {"url": "..."}}`). Returns transcribed text. Raises `LLMUnavailable` on failure.
-- Every caller in `core/criteria_extractor.py`, `core/evaluator.py`, `core/ocr_pipeline.py` wraps calls in `try/except LLMUnavailable` and routes to `core/fallback.py` (or to a graceful low-confidence result for the OCR case).
-### `core/pdf_utils.py`
-- `extract_pages(path: Path) -> list[dict]` — returns `[{"page": int, "text": str}]` via `fitz.open`.
-- `is_text_pdf(path: Path) -> bool` — heuristic on average chars per page.
-- `render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image` — for OCR.
-### `core/ocr_pipeline.py`
-The robustness centerpiece. Orchestrates the three tiers described in section 4.4.
-```
-def extract_document(file_path: Path) -> list[ExtractedPage]: ...
-```
-`ExtractedPage` shape: `{"page": int, "text": str, "source_type": "text_pdf" | "tesseract" | "vision_llm", "confidence": float, "raw_tier_results": {"tesseract_conf": float | None, "vision_used": bool}}`.
-Logic:
-1. If file is image (PNG/JPG): treat as 1-page; go straight to tier 2.
-2. If file is PDF and `is_text_pdf == True`: tier 1 (text_pdf, conf=1.0).
-3. Else: for each page render to image, run tier 2 (Tesseract via `pytesseract.image_to_data`), compute mean confidence excluding `-1`s, divided by 100.
-4. If `mean_conf < OCR_TESSERACT_MIN_CONF` or text length absurdly short relative to image size: invoke tier 3 (`llm_client.chat_vision(VISION_OCR_PROMPT, image)`), set `source_type="vision_llm"`, `confidence=0.95`. Log `vision_ocr_invoked` audit entry.
-5. If tier 3 raises `LLMUnavailable`: keep tier-2 result with `confidence < 0.65` (will trigger `needs_review` downstream).
-6. Cache per-file results in `.ocr_cache/<file_hash>.json` so reruns don't re-OCR.
-### `core/chunker.py`
-- `chunk_tender(pages: list[dict], tender_id: str) -> list[dict]` — ~500-token chunks per page, regex-detect clause headings (`^\d+(\.\d+)*\s+`).
-- `chunk_bidder(pages: list[ExtractedPage], bidder_id: str, doc_name: str) -> list[dict]` — page-level chunks (one per page; or per-doc if very short). Each chunk's metadata includes `bidder_id`, `doc_name`, `page`, `source_type`, `ocr_confidence`.
-### `core/vectorstore.py`
-- `get_client()` cached with `@st.cache_resource`, returns `chromadb.PersistentClient(path=CHROMA_DIR)`.
-- `get_collection(name: str)` — creates if missing.
-- `add_chunks(collection, chunks: list[dict], metadatas: list[dict])` — ID = `hash(text)[:16]` to dedupe across reruns.
-- `query(collection, text: str, k: int = 4, where: dict | None = None) -> list[dict]` — returns `[{text, metadata, distance}, ...]`.
-### `core/criteria_extractor.py`
-```
-def extract_criteria(tender_pdf_path: Path) -> list[Criterion]: ...
-```
-1. `pdf_utils.extract_pages(tender_pdf_path)` → join all page text with `\n--- PAGE N ---\n` markers.
-2. `llm.chat_json(EXTRACT_CRITERIA_PROMPT_SYSTEM, prompt + tender_text)`.
-3. Parse JSON `{"criteria": [...]}`, validate via Pydantic, attach UUIDs if absent.
-4. Index criteria text into the `tender_chunks` collection (for future retrieval / explainability features).
-5. Return list. On `LLMUnavailable` → `fallback.load_criteria()` + audit `precomputed_fallback_used`.
-### `core/bidder_processor.py`
-```
-def process_bidder(bidder_id: str, files: list[Path]) -> None:
-    """Extract, chunk, and index every file for this bidder."""
-def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
-    """Retrieve top-k bidder chunks relevant to this criterion."""
-```
-- Process step: each file → `ocr_pipeline.extract_document` → `chunker.chunk_bidder` → `vectorstore.add_chunks(bidder_chunks, ..., where={"bidder_id": bidder_id})`. Audit: `bidder_processed`.
-- Gather step: query string = `criterion.title + " " + " ".join(criterion.query_hints)`; `vectorstore.query(bidder_chunks, q, k=4, where={"bidder_id": bidder_id})`. Map results to `Evidence` objects.
-### `core/evaluator.py`
-```
-def evaluate(bidder_id: str, criterion: Criterion) -> Verdict: ...
-def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]: ...
-```
-Algorithm for `evaluate`:
-1. `evidence = bidder_processor.gather_evidence(bidder_id, criterion)`.
-2. If `evidence` empty: return `Verdict(verdict="needs_review", reason="No matching evidence found in submitted documents.", llm_confidence=0, combined_confidence=0)` and audit. Done.
-3. Call `llm.chat_json(EVALUATE_CRITERION_PROMPT_SYSTEM, render_user(criterion, evidence))`.
-4. Parse: `{verdict, extracted_value, normalized_value, chosen_source, llm_confidence, reason}`.
-5. Compute `combined_confidence` based on `chosen_source.source_type`:
-   - `"text_pdf"`: `combined = llm_confidence`
-   - `"vision_llm"`: `combined = 0.7 * llm_confidence + 0.3 * 0.95`
-   - `"tesseract"`: `combined = 0.6 * llm_confidence + 0.4 * tesseract_conf`
-6. Apply threshold rules (in order):
-   - LLM verdict is `needs_review` → keep.
-   - `combined >= 0.80` → keep LLM verdict.
-   - `0.55 <= combined < 0.80` AND verdict is `not_eligible` → **downgrade to `needs_review`** (never silently disqualify).
-   - `combined < 0.55` → force `needs_review`.
-7. Build `Verdict` object, audit `criterion_evaluated`, return.
-8. On `LLMUnavailable` → `fallback.load_evaluation(bidder_id, criterion.id)` + audit fallback.
-### `core/audit.py`
-- SQLite single table:
-  ```sql
-  CREATE TABLE audit_log (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    ts TEXT NOT NULL,
-    action TEXT NOT NULL,
-    actor TEXT NOT NULL,
-    model_version TEXT,
-    bidder_id TEXT,
-    criterion_id TEXT,
-    payload_json TEXT
-  );
-  ```
-- `log(action: str, actor: str = "system", **fields) -> int` — inserts.
-- `query(filters: dict | None = None) -> list[dict]` — filterable by `bidder_id`, `action`, date range.
-- Action vocabulary: `criteria_extracted`, `bidder_processed`, `criterion_evaluated`, `human_review_action`, `precomputed_fallback_used`, `vision_ocr_invoked`.
-- Connection cached with `@st.cache_resource`.
-### `core/fallback.py`
-- `load_criteria() -> list[Criterion]` — reads `data/precomputed/criteria.json`.
-- `load_evaluation(bidder_id: str, criterion_id: str) -> Verdict` — reads `data/precomputed/eval_bidder_<id>.json` and indexes into the `criterion_id` block.
-- Each fallback hit logs `precomputed_fallback_used` and sets `st.session_state["fallback_active"] = True` so the UI can render the banner.
----
-## 7. Data Schemas
-All canonical, all serialized as JSON for storage and inter-module communication.
-### `Criterion`
-```json
-{
-  "id": "C1",
-  "title": "Minimum Annual Turnover",
-  "category": "financial",
-  "mandatory": true,
-  "description": "Average annual turnover during the last three financial years shall not be less than INR 5 Crore.",
-  "rule": {
-    "type": "numeric_threshold",
-    "field": "annual_turnover_inr",
-    "operator": ">=",
-    "value": 50000000,
-    "unit": "INR"
-  },
-  "query_hints": ["annual turnover", "total revenue", "ITR", "audited financials"],
-  "source_page": 3,
-  "source_clause": "3.2(a)"
-}
-```
-Fields:
-- `category`: `"financial" | "technical" | "compliance"`.
-- `rule.type`: `"numeric_threshold" | "count_threshold" | "certification_present" | "document_present"`.
-- `rule.operator`: `">=" | "<=" | "==" | "exists"`.
-- `query_hints`: 3–5 short noun phrases used to build retrieval queries.
-### `Evidence` (one retrieved chunk during evaluation)
-```json
-{
-  "bidder_id": "bidder_a",
-  "doc_name": "audited_financials.pdf",
-  "page": 4,
-  "text": "...annual turnover for FY 2024-25 was INR 6,20,00,000...",
-  "source_type": "text_pdf",
-  "ocr_confidence": null
-}
-```
-- `source_type`: `"text_pdf" | "tesseract" | "vision_llm"`.
-- `ocr_confidence`: 0.0–1.0 if OCR was used; `null` for `text_pdf`.
-### `Verdict`
-```json
-{
-  "verdict_id": "V-uuid",
-  "bidder_id": "bidder_a",
-  "criterion_id": "C1",
-  "verdict": "eligible",
-  "extracted_value": "INR 6.2 Cr",
-  "normalized_value": 62000000,
-  "source": {
-    "doc_name": "audited_financials.pdf",
-    "page": 4,
-    "snippet": "...annual turnover... INR 6,20,00,000...",
-    "source_type": "text_pdf"
-  },
-  "llm_confidence": 0.93,
-  "ocr_confidence": null,
-  "combined_confidence": 0.93,
-  "reason": "Extracted turnover of INR 6.2 Cr exceeds the required threshold of INR 5 Cr.",
-  "model_version": "deepseek-v4-pro@2026-05-07",
-  "timestamp": "2026-05-07T12:34:56Z",
-  "review_status": "pending"
-}
-```
-- `verdict`: `"eligible" | "not_eligible" | "needs_review"`.
-- `review_status`: `"pending" | "approved" | "edited" | "rejected"`.
-### `AuditEntry`
-Maps directly to the SQLite row (see `core/audit.py` description). The `payload_json` field carries the action-specific details (e.g., for `criterion_evaluated`: `{"verdict": "eligible", "combined_confidence": 0.93}`).
----
-## 8. LLM Prompts
-All three prompts must demand strict JSON output where applicable, run at `temperature=0`, and rely on `response_format={"type": "json_object"}` for the JSON ones.
-### `EXTRACT_CRITERIA_PROMPT`
-**System:**
-> You are an expert in Indian government tender analysis (CRPF context). Your job is to extract eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria not present in the text. Classify each criterion as mandatory or optional based on cue words: "shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", "may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints that an evaluator would search for in bidder documents.
-**User template:** the full tender text + a JSON schema example + the instruction:
-> Return `{"criteria": [Criterion, ...]}`. Each Criterion must include id (C1, C2, ...), title, category (financial / technical / compliance), mandatory (bool), description (verbatim or close paraphrase), rule (typed per the schema), query_hints, source_page (int), source_clause (string).
-### `EVALUATE_CRITERION_PROMPT`
-**System:**
-> You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest single source. NEVER guess values not present in the evidence. If evidence is missing or ambiguous, return needs_review with reason. Output STRICT JSON.
-**User template** (variables substituted):
-```
-CRITERION:
-{ ...criterion JSON... }
-RETRIEVED EVIDENCE (top-k chunks from this bidder, with source + OCR confidence):
-[
-  { "doc_name": "...", "page": 4, "ocr_confidence": null, "source_type": "text_pdf",
-    "text": "..." },
-  ...
-]
-Return JSON:
-{
-  "verdict": "eligible" | "not_eligible" | "needs_review",
-  "extracted_value": "<short string as found>",
-  "normalized_value": <number or null>,
-  "chosen_source": {"doc_name": "...", "page": <int>, "snippet": "<<= 200 chars>", "source_type": "..."},
-  "llm_confidence": <0..1>,
-  "reason": "<one or two sentences>"
-}
-Rules:
-- If evidence directly contains a value satisfying the rule, verdict=eligible with high llm_confidence.
-- If evidence directly contradicts the rule, verdict=not_eligible.
-- If no relevant evidence retrieved, verdict=needs_review, llm_confidence<=0.4.
-- If the source is OCR with low confidence and the value is borderline, lean to needs_review.
-```
-### `VISION_OCR_PROMPT`
-**System:**
-> You are an OCR engine for Indian government procurement documents. Transcribe the image text faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — no commentary.
-**User text:** "Transcribe this document page completely. Pay special attention to numeric values like turnover figures (INR / Crore / Lakh), dates, and registration numbers." (Image attached.)
----
-## 9. Build Order
-The order is chosen so that the system is **demoable after every major step**. Each numbered item is also the spec sequence — write the spec, get it reviewed, then implement.
-### Step 1 — Skeleton (≈ 15 min)
-Folder structure, `requirements.txt`, `packages.txt`, `.env.example`, `.gitignore`, stub `app.py` with 5 empty Streamlit tabs and sidebar.
-**Spec:** `specs/00_skeleton.md` (light — mostly file list and stub contents).
-**Checkpoint:** `streamlit run app.py` shows the empty shell.
-### Step 2 — Mock data generation (≈ 25 min)
-`scripts/generate_mock_data.py` produces tender PDF, three bidders' PDFs, and the noisy scan PNG (per section 10).
-**Spec:** `specs/11_mock_data.md`.
-**Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence.
-### Step 3 — Config + schemas + prompts (≈ 25 min)
-`core/config.py`, `core/schemas.py`, `core/prompts.py`.
-**Spec:** `specs/01_config_and_schemas.md`.
-### Step 4 — LLM client (≈ 25 min)
-`core/llm_client.py` with both `chat_json` and `chat_vision`. Smoke-test with a one-line script that calls each.
-**Spec:** `specs/02_llm_client.md`.
-**Checkpoint:** ad-hoc REPL call to `chat_json("hi", "respond with {\"ok\": true}")` returns `{"ok": True}`.
-### Step 5 — PDF utils + chunker (≈ 15 min)
-`core/pdf_utils.py`, `core/chunker.py`.
-**Spec:** `specs/03_pdf_utils.md`, `specs/05_chunker.md` (can be combined).
-### Step 6 — Criteria extractor + Tab 2 wiring (≈ 30 min)
-`core/criteria_extractor.py` + minimal `ui/tab_tender.py`.
-**Spec:** `specs/07_criteria_extractor.md`.
-**Checkpoint:** Tab 2 in the running app shows 5 criteria extracted from the mock tender.
-### Step 7 — OCR pipeline (≈ 30 min)
-`core/ocr_pipeline.py`. Verify on `turnover_certificate_scan.png`.
-**Spec:** `specs/04_ocr_pipeline.md`.
-**Checkpoint:** running `extract_document(turnover_certificate_scan.png)` first attempts Tesseract (low conf), then falls through to vision-LLM, returns `source_type="vision_llm"` with the correct turnover figure.
-### Step 8 — Vector store + bidder processor (≈ 25 min)
-`core/vectorstore.py`, `core/bidder_processor.py`.
-**Spec:** `specs/06_vectorstore.md`, `specs/08_bidder_processor.md`.
-**Checkpoint:** `process_bidder("bidder_a", ...)` indexes all five docs; `gather_evidence("bidder_a", turnover_criterion)` returns top-4 chunks, the strongest mentioning "INR 6,20,00,000".
-### Step 9 — Evaluator + threshold logic (≈ 25 min)
-`core/evaluator.py`.
-**Spec:** `specs/09_evaluator.md`.
-**Checkpoint:** `evaluate("bidder_a", turnover_criterion)` returns verdict=eligible, combined_confidence ≥ 0.8; `evaluate("bidder_b", turnover_criterion)` returns verdict=not_eligible.
-### Step 10 — Audit + fallback (≈ 20 min)
-`core/audit.py`, `core/fallback.py`.
-**Spec:** `specs/10_audit_and_fallback.md`.
-### Step 11 — Pre-compute results (≈ 15 min)
-`scripts/precompute_results.py` runs the full pipeline, dumps `criteria.json` + `eval_bidder_*.json`. Commit results.
-**Spec:** `specs/12_precompute.md`.
-**Checkpoint:** four JSON files exist and validate against the schemas.
-### Step 12 — UI tabs (≈ 80 min total)
-- Tab 3 — Bidder evaluation (35 min): rows with verdict pills, source chips, OCR-tier badges, confidence bars, expandable Reason and Source Snippet.
-- Tab 4 — Review queue (15 min): filtered list of `needs_review` rows with Approve/Edit/Reject.
-- Tab 5 — Audit log (15 min): sortable table + CSV export.
-- Tab 1 — Overview (15 min): hero, architecture image, KPIs, "Use Pre-loaded Demo" CTA.
-`ui/components.py` is built incrementally as Tabs 3 and 4 need it.
-**Spec:** `specs/13_ui_tabs.md` (covers all five tabs and `components.py`).
-### Step 13 — Smoke test + README (≈ 15 min)
-`scripts/smoke_test.py` (programmatic full flow), `README.md`.
-### Step 14 — Streamlit Cloud deploy (≈ 25 min)
-Push to GitHub, connect Streamlit Cloud, set `DEEPSEEK_API_KEY` in app secrets, verify deployed URL works in incognito with API and again with the key removed (precomputed mode).
-### Step 15 — Submission package (≈ 90 min)
-Architecture diagram, 8-slide deck, 4 screenshots, 2-min demo video (OBS / Win+G), zip source, fill submission form.
----
-## 10. Mock Data Strategy
-Single deterministic script `scripts/generate_mock_data.py`, runs in <30 seconds.
-### Tender PDF — `data/tender/crpf_construction_tender.pdf`
-`reportlab` SimpleDocTemplate, 5–6 pages with these sections: (1) Introduction, (2) Scope of Work, (3) Eligibility Criteria, (4) Submission Procedure, (5) Evaluation Methodology, (6) Annexures. Section 3 contains five criteria phrased in formal tender language (this is the theme's sample scenario verbatim, so judges will recognize it):
-| ID | Clause | Text | Mandatory? | Category |
-|---|---|---|---|---|
-| C1 | 3.2(a) | "...minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years..." | Yes | financial |
-| C2 | 3.2(b) | "...successfully completed at least three (3) similar construction projects in the last five (5) financial years..." | Yes | technical |
-| C3 | 3.2(c) | "...shall possess a valid Goods and Services Tax (GST) registration..." | Yes | compliance |
-| C4 | 3.2(d) | "...shall hold a valid ISO 9001:2015 Quality Management System certification..." | Yes | compliance |
-| C5 | 3.2(e) | "...preferably, the bidder may have prior experience with paramilitary infrastructure..." | **No** | technical |
-C5 tests the mandatory-vs-optional classification.
-### Bidder A (clearly eligible) — typed PDFs only
-`company_profile.pdf`, `audited_financials.pdf` (FY 22-23: ₹5.8 Cr, 23-24: ₹6.2 Cr, 24-25: ₹7.1 Cr), `project_experience.pdf` (5 projects in 5 years), `gst_certificate.pdf` (GSTIN, valid 2027), `iso_9001.pdf` (valid 2027).
-### Bidder B (clearly ineligible — turnover too low) — typed PDFs only
-Same docs as A but `audited_financials.pdf` shows ₹1.2 / ₹1.5 / ₹1.8 Cr (all below threshold). Other criteria pass.
-### Bidder C (needs review — scanned turnover certificate) — typed + one scan
-Typed `company_profile.pdf`, `project_experience.pdf` (3 projects — borderline meets count), `gst_certificate.pdf`, `iso_9001.pdf`.
-**`turnover_certificate_scan.png`** generation:
-1. Render a `reportlab` page with the CA's turnover statement.
-2. Convert to `PIL.Image` via `pillow`.
-3. Apply: `ImageFilter.GaussianBlur(radius=1.5)`, salt-and-pepper noise via `numpy`, `image.rotate(-2, fillcolor="white")`, JPEG-compress at quality=40, save as PNG.
-4. Outcome: Tesseract reads it with mean confidence ~50–65% → triggers Tier-3 vision LLM. Vision LLM transcribes correctly; combined-confidence rule still routes Bidder C to `needs_review` (this is intended — it demonstrates the safety rule).
-### Pre-computed fallback files — `data/precomputed/`
-After the pipeline modules are working, run `scripts/precompute_results.py` once to produce:
-- `criteria.json` — output of `extract_criteria(tender_pdf)`.
-- `eval_bidder_a.json`, `eval_bidder_b.json`, `eval_bidder_c.json` — per-bidder verdicts for all criteria.
-Commit these four files to the repo. They are the safety net for live demos.
----
-## 11. Streamlit UI
-5 tabs, left-to-right narrative order:
-### Tab 1 — Overview
-Hero text ("TenderIQ — explainable AI for tender evaluation"), architecture image (`assets/architecture.png`), 4 KPI cards (criteria extracted, bidders evaluated, hours saved, audit entries). "Use Pre-loaded Demo Data" (default) and "Upload Your Own" CTA.
-### Tab 2 — Tender Analysis
-File uploader (defaults to mock tender preview). Button **"Extract Criteria (Live LLM)"** runs `criteria_extractor`. Results render as cards with category badge (color-coded), mandatory pill, description, source-page chip. Cached to `st.session_state["criteria"]`.
-### Tab 3 — Bidder Evaluation
-Bidder multi-select (defaults all 3). Button **"Run Evaluation"** processes each bidder × each criterion. Output: rows with verdict pill (green/red/amber), extracted value, source chip (doc + page + **OCR-tier badge** showing `text_pdf` / `tesseract` / `vision_llm`), confidence bar, expandable Reason and Source Snippet. Per-bidder summary header: "X / 4 mandatory criteria met — Overall: Eligible / Not Eligible / Needs Review".
-### Tab 4 — Human Review Queue
-Filtered to verdicts where `review_status == "pending"` AND `verdict == "needs_review"`. Each row: criterion, bidder, extracted value (editable), confidence, reason, source snippet, image preview if OCR'd. Buttons: Approve / Edit & Approve / Reject — each writes audit entry and updates `review_status`.
-### Tab 5 — Audit Log
-Sortable table from `audit.query()`. Filter by bidder, action type. CSV export.
-### Sidebar (always visible)
-Logo, project name, **DeepSeek connection status dot**:
-- Green: live connection, no fallback fired this session.
-- Amber: fallback fired at least once this session.
-- Red: probe at startup failed.
-"Reset Session" button. If `st.session_state["fallback_active"]`, show banner: "⚠ Live API unavailable — showing pre-computed results."
----
-## 12. requirements.txt and packages.txt
-`requirements.txt` (pinned):
-```
-streamlit==1.39.0
-openai==1.51.0
-pymupdf==1.24.10
-pytesseract==0.3.13
-Pillow==10.4.0
-numpy==1.26.4
-chromadb==0.5.5
-sentence-transformers==3.1.1
-pydantic==2.9.2
-python-dotenv==1.0.1
-reportlab==4.2.5
-pandas==2.2.3
-```
-`packages.txt` (apt packages for Streamlit Cloud):
-```
-tesseract-ocr
-poppler-utils
-```
----
-## 13. Risks and Mitigations
-| Risk | Mitigation |
-|---|---|
-| **DeepSeek API down or rate-limited mid-demo.** | Live-first with silent fallback to `data/precomputed/*.json`. Sidebar dot turns amber. App keeps working. |
-| **Tesseract install on Streamlit Cloud.** | `packages.txt` with `tesseract-ocr`. If it still fails: Tier-3 vision LLM works on raw image input, and `data/precomputed/eval_bidder_c.json` is the final safety net. |
-| **DeepSeek vision call (Tier 3) fails.** | Tesseract result accepted with `confidence < 0.65` → flows to `needs_review`. Demo still works. |
-| **ChromaDB first-run sentence-transformers download (~80 MB).** | `@st.cache_resource` on the client. README warns "first cloud load may take ~30s". Pre-warm by visiting deployed URL once before submission. |
-| **LLM returns malformed JSON.** | `response_format={"type":"json_object"}` + 2 retries with stricter system prompt → fall back to precomputed for that item. |
-| **PyMuPDF licensing.** | AGPL but allowed for hackathon use; pin `pymupdf==1.24.10`; mention in README. |
-| **API key leak in repo.** | `.env` gitignored; `.env.example` ships with placeholder; Streamlit Cloud secrets used in deploy; pre-commit visual diff check. |
-| **Time overrun.** | Compression order: skip Tab 1 KPIs → skip optional 5th criterion → skip CSV export → keep core flow (Tabs 2–4) intact for the video. |
----
-## 14. Verification (run before recording the demo video)
-Treat this as the acceptance test. The demo video should walk through these steps in order.
-1. **Cold start.** Delete `.chroma/`, `audit.db`. Run `streamlit run app.py`. App opens in <10s; Tab 1 renders.
-2. **Live extraction.** Tab 2 → "Extract Criteria" → 5 criteria appear within 10–20s. Sidebar dot green.
-3. **Live evaluation, Bidder A.** Tab 3 → select Bidder A → "Run Evaluation". All 4 mandatory criteria → `eligible` with combined confidence ≥ 0.80.
-4. **Live evaluation, Bidder B.** Turnover criterion → `not_eligible` with reason citing low turnover figure and source page.
-5. **Live evaluation, Bidder C — the OCR demo path.** Turnover criterion → triggers Tier 2 (Tesseract low conf) → triggers Tier 3 (DeepSeek Vision). UI shows "Read by Tesseract @ ~58% → Vision-LLM @ 95%". Final verdict: `needs_review`. Audit log gains a `vision_ocr_invoked` entry.
-6. **Review action.** Tab 4 → click "Approve" on Bidder C's turnover row → audit log gains `human_review_action` entry within 1 second; `review_status` updates.
-7. **Audit export.** Tab 5 → "Export CSV" → CSV downloads with all entries.
-8. **No-API run.** Rename `.env` (or unset secret), restart app → all "Run Live" buttons silently fall back to precomputed, banner shown, sidebar dot amber, audit gets `precomputed_fallback_used` entries.
-9. **Smoke test.** `python scripts/smoke_test.py` exits 0.
-10. **Deployed URL.** Open Streamlit Cloud URL in incognito; repeat steps 1–6.
----
-## 15. Submission Deliverables (Round 2 form fields)
-Mapping of submission requirements to artifacts:
-| Form field | Artifact |
-|---|---|
-| Title | "TenderIQ — Explainable AI for Tender Evaluation" |
-| Description | Adapted from `idea.md` |
-| Parent Submission | The shortlisted Round 1 idea |
-| Theme | Theme 3 |
-| Snapshots | `assets/screenshots/*.png` |
-| Video URL | YouTube unlisted link to 2-min demo |
-| Presentation | `deck/TenderIQ_Pitch.pdf` |
-| Demo Link | Streamlit Cloud URL |
-| Repository URL | GitHub URL |
-| Source Code | Zip of repo (excluding `.env`, `.chroma/`, `audit.db`) |
-| Instructions to Run | `README.md` quickstart |
-| Custom Attachment | `ARCHITECTURE.md` exported as PDF (with the architecture diagram embedded) |
----
-## 16. Definition of Done
-The build is done when **all** of the following are true:
-- [ ] All 10 verification steps in section 14 pass.
-- [ ] Streamlit Cloud URL is live and reachable.
-- [ ] GitHub repo is public, with `.env` not committed.
-- [ ] `README.md` quickstart works on a fresh clone with no API key (precomputed mode).
-- [ ] Pitch deck, demo video, screenshots, and architecture PDF are produced.
-- [ ] Submission form is filled and submitted.
-- [ ] Memory note saved with deployment URL and submission timestamp.

idea.md DELETED Viewed

@@ -1,157 +0,0 @@
-# TenderIQ: Explainable AI Platform for Automated Tender Evaluation & Eligibility Analysis
-**Phase:** Idea Phase (Shortlisted)
-**Last updated:** Apr 30, 2026
-**Theme:** Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
----
-## Problem Understanding
-Government tender evaluation today is a manual, time-consuming, and error-prone process. Procurement officers must review large volumes of unstructured documents — including PDFs, scanned files, and images — to verify whether bidders meet eligibility criteria such as financial thresholds, technical experience, and compliance certifications.
-This results in:
-- Inconsistent evaluations across reviewers
-- High turnaround time (often days per tender)
-- Lack of transparency and auditability
-- Risk of oversight in critical compliance checks
-Our solution addresses these challenges by transforming unstructured tender and bidder data into structured, explainable, and auditable decisions.
----
-## Proposed Solution: TenderIQ
-TenderIQ is an AI-powered platform designed to automate tender evaluation while ensuring human trust, explainability, and audit readiness. The system follows a four-stage pipeline:
-### Stage 1: Tender Understanding (Criteria Extraction)
-The platform extracts eligibility criteria from tender documents using a hybrid approach combining LLMs and rule-based parsing. It identifies:
-- Financial conditions (e.g., turnover ≥ ₹5 Cr)
-- Technical requirements (e.g., project experience)
-- Compliance rules (e.g., GST registration, ISO certifications)
-Each criterion is:
-- Classified as mandatory or optional
-- Converted into a structured, machine-readable format
-### Stage 2: Bidder Document Processing
-The system processes heterogeneous bidder submissions, including:
-- Typed PDFs
-- Scanned documents
-- Images
-- Word files
-The processing pipeline includes:
-- OCR for scanned documents and images
-- Layout-aware parsing for tables, forms, and certificates
-- Entity extraction for key values such as turnover, certifications, and project count
-All extracted information is stored along with:
-- Source reference (document and page number)
-- Confidence score
-### Stage 3: Evaluation and Decision Engine
-Each bidder is evaluated on a criterion-by-criterion basis using:
-- Rule-based validation (e.g., threshold checks)
-- Confidence-aware scoring
-The system produces three possible outcomes:
-- **Eligible**
-- **Not Eligible**
-- **Needs Manual Review**
-Ambiguous or low-confidence cases are never automatically rejected. Instead, they are flagged for human review to ensure fairness and compliance.
-### Stage 4: Explainability and Audit Layer (Key Differentiator)
-Every decision is fully explainable and traceable. Each evaluation includes:
-- The criterion being checked
-- The extracted value
-- Source document reference
-- Confidence score
-- Reason for the decision
-**Example:**
-```
-Criterion:       Minimum Turnover ≥ ₹5 Cr
-Extracted Value: ₹6.2 Cr
-Source:          Financial Statement (Page 4)
-Confidence:      92%
-Verdict:         Eligible
-```
-All system actions are logged with:
-- Model version
-- Timestamp
-- Reviewer actions
-This ensures complete end-to-end auditability suitable for government procurement processes.
----
-## Human-in-the-Loop Workflow
-The system incorporates a mandatory human review layer:
-- Low-confidence or conflicting cases are routed to reviewers
-- The interface highlights extracted data directly within documents
-- Reviewers can: Approve, Edit, or Reject decisions
-- All reviewer decisions are captured and used to improve system performance over time
----
-## Key Features
-- Handles scanned and unstructured documents effectively
-- Provides criterion-level explainability for every decision
-- Ensures no silent disqualification of bidders
-- Maintains a fully auditable decision pipeline
-- Scales across departments and tender types
----
-## Technology Stack
-| Layer | Technology |
-|---|---|
-| AI/ML | LLMs for extraction, OCR (Tesseract or PaddleOCR), LayoutLM for document understanding |
-| Backend | Python (FastAPI) with rule-based evaluation engine |
-| Storage | PostgreSQL and vector database for document retrieval |
-| Frontend | React-based dashboard |
----
-## Risks and Mitigation
-| Risk | Mitigation |
-|---|---|
-| OCR inaccuracies | Confidence scoring and human review |
-| Legal language ambiguity | Hybrid LLM and rule-based parsing |
-| Data inconsistency across documents | Conflict detection and validation logic |
-| Over-automation risk | Human-in-the-loop validation |
----
-## Why This Solution Stands Out
-- Balances automation with accountability
-- Designed specifically for government procurement constraints
-- Focuses on trust, explainability, and auditability
-- Works effectively with real-world, messy data formats
----
-## Future Scope (Round 2)
-- Integration with existing procurement systems
-- Model improvement through feedback loops
-- Multi-language document support
-- Advanced fraud detection in bidder submissions
----
-## Core Philosophy
-The system prioritizes **assistive intelligence over full automation**, ensuring that every decision is explainable, reviewable, and compliant with government procurement standards.

presentation_creation.md DELETED Viewed

@@ -1,689 +0,0 @@
-# TenderIQ — Presentation Creation Brief
-> **Purpose of this file:** Give a fresh Claude context everything it needs to generate
-> 5–6 distinct, high-quality presentations (PPT and/or PDF) for the CRPF Hackathon
-> submission. The creator should produce all variants in one session so the user can
-> pick the best one.
----
-## 0. How to use this file
-1. Read sections 1–5 carefully — they contain all project context, slide content,
-   and data.
-2. Read section 6 — it defines exactly 6 visual styles to produce.
-3. Read section 7 — it gives technical guidance for python-pptx and reportlab.
-4. Produce all 6 variants, saving them to `deck/` as:
-   - `deck/TenderIQ_v1_dark_professional.pptx`
-   - `deck/TenderIQ_v2_clean_minimal.pptx`
-   - `deck/TenderIQ_v3_government_official.pptx`
-   - `deck/TenderIQ_v4_modern_gradient.pdf`
-   - `deck/TenderIQ_v5_data_forward.pptx`
-   - `deck/TenderIQ_v6_infographic.pdf`
-5. Each variant uses the **same slide content** (section 4) but different visual
-   treatment (section 6). Do not cut content between variants.
-**DO NOT** reuse the existing `deck/TenderIQ_Pitch.pdf` — it was generated by a
-previous low-quality script and should be ignored entirely.
----
-## 1. Project Summary
-**Name:** TenderIQ
-**Tagline:** Explainable AI for Government Tender Evaluation
-**Event:** CRPF Hackathon — Theme 3: AI-Based Tender Evaluation and Eligibility
-Analysis for Government Procurement
-**Organisation:** Central Reserve Police Force, Ministry of Home Affairs,
-Government of India
-**One paragraph description:**
-TenderIQ automates the eligibility evaluation of bidders against government tender
-criteria. A procurement officer uploads a tender PDF; the system extracts each
-eligibility criterion using an LLM, processes bidder documents through a three-tier
-OCR pipeline (handling everything from typed PDFs to blurry scanned certificates),
-evaluates each bidder against each criterion with combined confidence scoring, and
-surfaces ambiguous cases for human review — all with a complete, exportable audit
-trail. The app is built on Streamlit and is deployable to a public URL in minutes.
----
-## 2. The Problem (use these facts on the problem slide)
-- A procurement committee manually reading tender documents and bidder submissions
-  can spend **3–5 days per tender**
-- Two evaluators reviewing the same bid **regularly reach different conclusions**
-- Documents arrive in **mixed formats**: typed PDFs, scanned certificates,
-  photographs of documents taken on phones
-- There is **no consistent audit trail** — decisions cannot be traced to specific
-  evidence
-- Government procurement is worth **₹50 lakh crore+ annually** in India
-- Manual evaluation is a bottleneck that **delays project execution**
----
-## 3. Key Differentiators (highlight these)
-1. **Three-tier OCR robustness** — most systems assume digital text; TenderIQ
-   handles scanned and photographed documents via a progressive pipeline:
-   PyMuPDF (typed PDF, instant) → Tesseract OCR (scans) → DeepSeek Vision LLM
-   (low-confidence scans, ~95% accuracy). Every page records which tier read it.
-2. **Never silent disqualification** — the safety rule: if combined confidence is
-   between 0.55 and 0.80 and the verdict is `not_eligible`, it is automatically
-   downgraded to `needs_review`. A bidder is never automatically disqualified at
-   medium confidence.
-3. **Criterion-level explainability** — every verdict is traceable to a specific
-   document, page number, OCR tier, extracted value, and plain-English reason.
-   Not just "pass/fail" — the officer can see exactly why.
-4. **Complete audit trail** — every action (extraction, OCR invocation, evaluation,
-   human review) is logged with timestamp, model version, actor, and payload to
-   SQLite. Exportable as CSV.
-5. **Works without internet** — pre-computed fallback JSON is shipped with the
-   repo. If the API goes down during a demo, the app continues seamlessly. Sidebar
-   turns amber to indicate fallback mode.
----
-## 4. Slide Content (use exactly this for all 6 variants)
-### SLIDE 1 — Title Slide
-- **Main title:** TenderIQ
-- **Subtitle:** Explainable AI for Government Tender Evaluation
-- **Event line:** CRPF Hackathon · Theme 3
-- **Tagline (small):** From days to minutes. Every decision traceable.
-- **Visual suggestion:** The ⚖️ scales emoji large, or an abstract representation
-  of documents flowing through a pipeline
----
-### SLIDE 2 — The Problem
-- **Title:** The Problem with Manual Tender Evaluation
-- **Three pain points (use as large visual callouts or icon+text cards):**
-  1. **3–5 Days** per tender evaluation by committee
-  2. **Inconsistent** — two evaluators, two different conclusions
-  3. **No audit trail** — decisions untraceably made
-- **Supporting points (smaller text or bullets):**
-  - Mixed document formats: typed PDFs, scans, phone photographs
-  - Government procurement worth ₹50 lakh crore+ annually in India
-  - Project delays traced directly to procurement bottlenecks
----
-### SLIDE 3 — Our Solution
-- **Title:** TenderIQ — Four Stages, End to End
-- **Four stage cards (equal weight, horizontal or 2×2 layout):**
-  **Stage 1 — Extract**
-  DeepSeek LLM reads the tender PDF and returns each criterion as structured JSON:
-  category, mandatory flag, threshold rule, source clause, query hints.
-  **Stage 2 — OCR & Index**
-  Three-tier pipeline handles any document format.
-  All text chunked and indexed for semantic retrieval.
-  **Stage 3 — Evaluate**
-  Vector search finds relevant evidence. LLM produces a verdict with confidence.
-  Safety rule prevents silent disqualification.
-  **Stage 4 — Review & Audit**
-  Borderline cases go to a human review queue. Every action logged.
-  Full audit trail exportable as CSV.
-- **Bottom line (callout box):**
-  "Minutes, not days. Every verdict traceable to a document and page."
----
-### SLIDE 4 — Architecture
-- **Title:** System Architecture
-- **Diagram description (reproduce this as a visual flow diagram):**
-```
-  Tender PDF                    Bidder Documents
-      │                         (PDFs · scans · photos)
-      ▼                                  │
-  DeepSeek LLM                   3-Tier OCR Pipeline
-  (Extract Criteria)             ① PyMuPDF   (typed PDF)
-      │                          ② Tesseract (scans)
-      ▼                          ③ Vision LLM (low conf.)
-  Criteria JSON                          │
-  (C1–C5 structured)             Vector Index (in-memory)
-      │                          all-MiniLM-L6-v2 embeddings
-      └──────────────────────────────────┘
-                          │
-                    DeepSeek LLM
-                    (Evaluate each criterion)
-                    combined confidence score
-                          │
-            ┌─────────────┴──────────────┐
-        eligible /                 needs_review
-       not_eligible             Human Review Queue
-            │                          │
-            └───────── SQLite Audit Log ────────┘
-```
-- **Key technical facts (sidebar or footnotes on slide):**
-  - Single-process Streamlit app — no separate backend
-  - Deployable to Streamlit Cloud or HuggingFace Spaces
-  - All storage is local: SQLite + in-memory vector index
-  - Only external dependency: DeepSeek API
----
-### SLIDE 5 — The OCR Demo (the centrepiece)
-- **Title:** Three-Tier OCR — Handling Any Document Format
-- **Three tier cards with visual progression:**
-  **Tier 1 — PyMuPDF**
-  - Trigger: Document is a typed/digital PDF
-  - Cost: Free, instant
-  - Confidence: 1.0 (lossless text extraction)
-  - Source label in UI: 📄 Typed PDF
-  **Tier 2 — Tesseract**
-  - Trigger: Scanned PDF or image file
-  - Cost: Free, local, fast
-  - Confidence: Mean of per-word OCR scores
-  - Source label in UI: 🔍 Tesseract
-  **Tier 3 — DeepSeek Vision LLM**
-  - Trigger: Tesseract confidence < 65%
-  - Cost: One API call
-  - Confidence: 0.95
-  - Source label in UI: 👁 Vision LLM
-  - Action: `vision_ocr_invoked` logged to audit
-- **Demo scenario callout (use a highlighted box):**
-  > **Bidder C submits a blurry, rotated CA certificate scan.**
-  > Tesseract reads it at ~55% confidence.
-  > Vision LLM transcribes the turnover figure correctly.
-  > Combined confidence = 0.58 → routed to human review.
-  > This is intentional — borderline evidence requires a human.
----
-### SLIDE 6 — Explainability & Compliance
-- **Title:** Every Decision is Explainable and Auditable
-- **Two columns:**
-  **Left — Criterion-Level Verdicts**
-  Each (bidder × criterion) pair shows:
-  - Which criterion was checked
-  - Which document and page provided the evidence
-  - What value was extracted (e.g. "INR 6.2 Cr")
-  - Which OCR tier read the document
-  - Combined confidence score (0–100%)
-  - Plain-English reason
-  **Right — Audit Trail**
-  Every action logged with:
-  - UTC timestamp
-  - Action type (criteria_extracted / bidder_processed / criterion_evaluated /
-    human_review_action / vision_ocr_invoked / precomputed_fallback_used)
-  - Model version
-  - Actor (system / officer)
-  - Full payload JSON
-  - Exportable as CSV
-- **Safety rule callout (prominent, in a coloured box):**
-  > **The Safety Rule:**
-  > If combined confidence is 0.55–0.80 AND verdict is `not_eligible`,
-  > the verdict is automatically downgraded to `needs_review`.
-  > A bidder is **never silently disqualified** at medium confidence.
----
-### SLIDE 7 — Demo Results
-- **Title:** Demo: Three Bidders, Three Outcomes
-- **Three bidder cards side by side:**
-  **Bidder A — Apex Constructions Pvt. Ltd.**
-  Result: ✅ ELIGIBLE
-  - C1 Turnover: INR 6.37 Cr avg (threshold: 5 Cr) — PASS
-  - C2 Projects: 5 completed including CRPF barracks �� PASS
-  - C3 GST: GSTIN 27AABCA1234F1Z5, Active — PASS
-  - C4 ISO 9001:2015: Valid June 2027 — PASS
-  - All typed PDFs, confidence ≥ 93% on all criteria
-  **Bidder B — BuildRight Enterprises**
-  Result: ❌ NOT ELIGIBLE
-  - C1 Turnover: INR 1.5 Cr avg (threshold: 5 Cr) — FAIL
-    "Average annual turnover of INR 1.5 Cr is below the required
-    minimum of INR 5 Cr."
-  - C2–C4: All pass
-  - Automatically disqualified with high confidence (95%)
-  **Bidder C — Shree Constructions & Services**
-  Result: ⚠️ NEEDS REVIEW
-  - C1 Turnover: Submitted as blurry scan
-    Tesseract ~55% → Vision LLM transcribes INR 5.4 Cr
-    Combined confidence 0.58 → needs review (safety rule)
-  - C2: Exactly 3 projects (borderline)
-  - C3–C4: Pass
-- **Bottom metric strip:**
-  | Metric | Value |
-  |--------|-------|
-  | Criteria extracted | 5 |
-  | Bidder documents processed | 15 |
-  | LLM evaluation calls | 15 |
-  | Vision OCR invocations | 1 |
-  | Human review items | 1 |
-  | Total audit entries | 20+ |
----
-### SLIDE 8 — Tech Stack & Future Work
-- **Title:** Stack, Impact & What's Next
-- **Left side — Tech Stack (as a clean table):**
-  | Component | Technology |
-  |-----------|------------|
-  | UI & orchestration | Streamlit 1.39 |
-  | LLM | DeepSeek API (OpenAI-compatible) |
-  | OCR Tier 1 | PyMuPDF 1.24 |
-  | OCR Tier 2 | Tesseract |
-  | OCR Tier 3 | DeepSeek Vision LLM |
-  | Semantic retrieval | sentence-transformers all-MiniLM-L6-v2 |
-  | Data validation | Pydantic v2 |
-  | Audit log | SQLite |
-  | Deployment | Streamlit Cloud / HuggingFace Spaces |
-- **Right side — Future Work (as bullets):**
-  - Multi-tender workspace — same bidder pool, multiple tenders
-  - GeM portal API integration — live tender ingestion
-  - Automated bidder ranking with weighted scoring
-  - LayoutLM for complex financial tables in scanned statements
-  - Multi-evaluator workflow with role-based approval
-  - Review queue email/SMS notifications
-  - Audit PDF export for procurement oversight submissions
-- **Bottom — Impact callout:**
-  > **3–5 days → minutes.**
-  > Every verdict traceable to a document, page, and model version.
-  > Built in one hackathon session. Deployable today.
----
-## 5. Narrative Arc (how the slides tell a story)
-The deck should flow as:
-1. **Hook** (Slide 1) — big, confident title
-2. **Pain** (Slide 2) — make the problem visceral with the 3 numbers
-3. **Solution** (Slide 3) — 4 clean stages, not overwhelming
-4. **Credibility** (Slide 4) — architecture shows it's real engineering
-5. **Differentiator** (Slide 5) — the OCR story is unique and concrete
-6. **Trust** (Slide 6) — explainability + audit builds confidence with judges
-7. **Proof** (Slide 7) — real outcomes, real numbers, real bidder scenarios
-8. **Vision** (Slide 8) — grounded stack + forward-looking future work
-Every slide should have **one dominant visual** and **limited text**. Judges skim.
-The most important information should be readable in 3 seconds.
----
-## 6. The Six Visual Styles
-Produce one presentation per style. All use the same slide content from section 4.
----
-### Style 1 — Dark Professional (PPTX)
-**File:** `deck/TenderIQ_v1_dark_professional.pptx`
-**Palette:**
-- Slide background: `#0D1B2A` (deep navy)
-- Primary text: `#F1F5F9` (near white)
-- Secondary text: `#94A3B8` (muted blue-grey)
-- Accent / headings: `#F0A500` (gold)
-- Eligible green: `#22C55E`
-- Not eligible red: `#EF4444`
-- Needs review amber: `#F59E0B`
-- Card backgrounds: `#1E3A5F` (lighter navy)
-- Borders: `#2D4A6B`
-**Typography:**
-- Headings: Calibri Bold or Arial Bold, 28–32pt, gold
-- Body: Calibri or Arial, 16–18pt, near-white
-- Captions/labels: 12–13pt, muted blue-grey
-**Style rules:**
-- Dark background on every slide
-- Title slide: large gold ⚖️ emoji, gold title text on navy
-- Section headings have a thin gold left border or underline
-- Cards/boxes: slightly lighter navy background (#1E3A5F) with gold border
-- Verdict chips: coloured filled rectangles (green/red/amber) with white text
-- Progress/confidence: horizontal bar in gold on dark track
-- The OCR tier cards (Slide 5): three columns, each with a different accent colour
-  (blue for Tier 1, purple for Tier 2, orange for Tier 3)
-- Architecture diagram (Slide 4): use white-on-dark text boxes connected with
-  gold arrows
----
-### Style 2 — Clean Minimal (PPTX)
-**File:** `deck/TenderIQ_v2_clean_minimal.pptx`
-**Palette:**
-- Slide background: `#FFFFFF`
-- Primary text: `#111827`
-- Secondary text: `#6B7280`
-- Accent: `#2563EB` (blue)
-- Light accent background: `#EFF6FF`
-- Eligible: `#059669`
-- Not eligible: `#DC2626`
-- Needs review: `#D97706`
-- Dividers/borders: `#E5E7EB`
-**Typography:**
-- Headings: Inter or Calibri Light Bold, 28–32pt, #111827
-- Body: Inter or Calibri, 15–16pt, #374151
-- Captions: 11–12pt, #9CA3AF
-**Style rules:**
-- White background throughout
-- Large amounts of whitespace — never fill the slide
-- Title slide: small ⚖️ followed by large "TenderIQ" in #111827, subtitle in grey
-- Section headings: simple left-aligned text with a 3px blue left border
-- Cards: white with a 1px #E5E7EB border and very subtle shadow (simulate with
-  slightly off-white fill)
-- No gradients, no heavy fills — colour used sparingly as accent only
-- Verdict chips: light fill (green/red/amber at 15% opacity) with bold coloured text
-- Numbers/stats (Slide 2): very large (80–96pt), blue accent colour, minimal
-  surrounding text
-- Architecture diagram (Slide 4): use grey boxes with blue connector arrows,
-  clean and uncluttered
----
-### Style 3 — Government Official (PPTX)
-**File:** `deck/TenderIQ_v3_government_official.pptx`
-**Palette:**
-- Primary: `#003580` (deep government blue, similar to NIC / India.gov.in)
-- Secondary: `#FFFFFF`
-- Accent: `#FF9933` (saffron, from the Indian tricolour)
-- Third accent: `#138808` (India green)
-- Background: `#F5F5F0` (off-white, like a government document)
-- Text: `#1A1A1A`
-- Borders: `#003580`
-**Typography:**
-- Headings: Times New Roman Bold or Cambria Bold, 26–30pt, #003580
-- Body: Arial or Calibri, 14–15pt, #1A1A1A
-- Official labels: small caps, 11pt, #003580
-**Style rules:**
-- Header bar on every slide: deep blue (#003580) band at top with white text for
-  slide title; thin saffron line below the header band
-- Footer on every slide: "TenderIQ · CRPF Hackathon · Theme 3" in small text
-  on the blue header colour
-- Title slide: formal layout — emblem/logo area top left, large title centred,
-  "Ministry of Home Affairs" sub-line
-- Slide content area: off-white background, clean margins
-- Tables: blue header row (#003580, white text), alternating white/#F0F4FF rows
-- Callout boxes: thin blue border, very light blue fill (#EBF0FF)
-- Verdict indicators: use formal language labels ("ELIGIBLE", "NOT ELIGIBLE",
-  "UNDER REVIEW") in coloured text, no emoji
-- This style should feel like an official government presentation, not a startup deck
----
-### Style 4 — Modern Gradient (PDF via reportlab)
-**File:** `deck/TenderIQ_v4_modern_gradient.pdf`
-**Palette:**
-- Gradient 1 (title slide): `#667EEA` → `#764BA2` (purple-blue)
-- Gradient 2 (content slides background strip): `#0EA5E9` → `#2563EB`
-- Card fills: `#FFFFFF` with coloured top accent border
-- Text on gradient: `#FFFFFF`
-- Text on white: `#0F172A`
-- Eligible: `#10B981`
-- Not eligible: `#F43F5E`
-- Needs review: `#FBBF24`
-**Typography:**
-- Headings on gradient: white, bold, 24–28pt
-- Body on white cards: dark, 12–14pt
-- Stat numbers: 48–56pt, gradient-coloured
-**Style rules (reportlab specific):**
-- Title slide: full-page gradient background (use `canvas.linearGradient` if
-  available, or approximate with filled rectangles stepping from #667EEA to #764BA2)
-- Content slides: white background with a gradient-filled header band (top 20% of
-  slide) for the slide title
-- Cards: white rectangles with a 4px top border in a theme colour, subtle grey
-  border on other sides
-- Stat numbers on Slide 2: very large, rendered in the gradient colours
-- OCR tiers on Slide 5: three cards with top borders in blue, purple, orange
-- Arrows in architecture diagram: use curved lines in gradient blue
-- Avoid heavy outlines — use fill and spacing instead
-- Page numbers bottom right in muted colour
----
-### Style 5 — Data Forward (PPTX)
-**File:** `deck/TenderIQ_v5_data_forward.pptx`
-**Palette:**
-- Background: `#FAFAFA`
-- Primary: `#1E293B`
-- Accent: `#6366F1` (indigo)
-- Chart colours: `#22C55E`, `#EF4444`, `#F59E0B`, `#3B82F6`, `#8B5CF6`
-- Grid lines: `#E2E8F0`
-- Text: `#334155`
-**Typography:**
-- Data labels: 14–16pt bold, #1E293B
-- Axis labels / captions: 10–11pt, #64748B
-- Slide titles: 24pt bold, indigo
-**Style rules:**
-- This variant leads with data visualisation on every slide where possible
-- Slide 2 (Problem): use a simple bar chart showing "Days per tender" comparison
-  (manual: 3–5 days, TenderIQ: minutes represented as <0.1 days)
-- Slide 3 (Solution): use a horizontal process flow with numbered circles
-- Slide 5 (OCR): use a stacked bar or table showing accuracy by tier
-- Slide 7 (Demo results): use a verdicts breakdown chart — 3 bidders × 5 criteria
-  as a colour-coded matrix (green/red/amber cells)
-- Slide 8 (Stack): use a visual table with technology icons (text-based approximation)
-- Charts should be built with python-pptx chart objects (not images) where possible,
-  or use matplotlib to embed PNG charts
-**Key chart specs:**
-- Demo results matrix (Slide 7): 3 rows (bidders) × 5 columns (criteria), each cell
-  filled green/red/amber with a 1-letter code (E/N/R)
-- OCR confidence comparison (Slide 5): simple bar chart showing
-  Tier 1: 100%, Tier 2: ~55–65%, Tier 3: ~95%
-- Problem scale (Slide 2): two-bar chart, Manual vs TenderIQ, logarithmic scale
-  or just text-anchored bars
----
-### Style 6 — Infographic (PDF via reportlab)
-**File:** `deck/TenderIQ_v6_infographic.pdf`
-**Palette:**
-- Background: `#FFFFFF`
-- Section stripe: `#F8FAFC`
-- Primary icon colour: `#2563EB`
-- Icon accents: `#22C55E`, `#EF4444`, `#F59E0B`, `#8B5CF6`
-- Text: `#0F172A`
-- Subtext: `#64748B`
-**Typography:**
-- Large numbers: 48–60pt, bold, primary colour
-- Section labels: 10pt, all-caps, letter-spaced, muted
-- Body: 12pt, dark
-**Style rules:**
-- Every slide is built around a large central icon or number
-- Slide 2 (Problem): three large numbers (3–5, ✗, ?) each with a one-line label
-- Slide 3 (Solution): four large icons (📄 → 🔍 → ⚖️ → 📋) with stage labels
-- Slide 4 (Architecture): a vertical flow infographic, not a box diagram —
-  icon per stage, connecting lines, short labels
-- Slide 5 (OCR): three large tier icons stacked with an arrow between them,
-  confidence % shown as a circular progress indicator (drawn with arc)
-- Slide 7 (Demo): three large outcome icons (✅ ❌ ⚠️) each with 3 bullet points
-- Slide 8 (Future): icon grid of 6 future directions, each with a 1-line label
-- No heavy borders — whitespace is the separator
-- Use reportlab's `canvas.drawString`, arcs for circular indicators, and
-  filled rectangles for bars
----
-## 7. Technical Implementation Notes
-### python-pptx (for Styles 1, 2, 3, 5)
-```python
-from pptx import Presentation
-from pptx.util import Inches, Pt, Emu
-from pptx.dml.color import RGBColor
-from pptx.enum.text import PP_ALIGN
-from pptx.util import Inches, Pt
-# Slide size — widescreen 16:9
-prs = Presentation()
-prs.slide_width = Inches(13.33)
-prs.slide_height = Inches(7.5)
-# Add a blank slide
-slide_layout = prs.slide_layouts[6]  # blank
-slide = prs.slides.add_slide(slide_layout)
-# Add a filled rectangle
-from pptx.util import Inches
-shape = slide.shapes.add_shape(
-    MSO_SHAPE_TYPE.RECTANGLE,  # or use 1
-    Inches(0), Inches(0), Inches(13.33), Inches(7.5)
-)
-shape.fill.solid()
-shape.fill.fore_color.rgb = RGBColor(0x0D, 0x1B, 0x2A)
-shape.line.fill.background()  # no border
-# Add text box
-from pptx.util import Inches, Pt
-txBox = slide.shapes.add_textbox(Inches(1), Inches(2), Inches(11), Inches(2))
-tf = txBox.text_frame
-tf.word_wrap = True
-p = tf.paragraphs[0]
-p.text = "TenderIQ"
-p.alignment = PP_ALIGN.CENTER
-run = p.runs[0]
-run.font.size = Pt(54)
-run.font.bold = True
-run.font.color.rgb = RGBColor(0xF0, 0xA5, 0x00)  # gold
-# Save
-prs.save("deck/TenderIQ_v1_dark_professional.pptx")
-```
-**Key python-pptx patterns to use:**
-- `slide.shapes.add_shape(1, ...)` — adds a rectangle (MSO_SHAPE_TYPE.RECTANGLE = 1)
-- `shape.fill.solid()` + `shape.fill.fore_color.rgb = RGBColor(r, g, b)` — fill colour
-- `shape.line.fill.background()` — remove border
-- `slide.shapes.add_textbox(left, top, width, height)` — text box
-- `tf.paragraphs[0].runs[0].font.color.rgb` — font colour
-- `slide.shapes.add_picture(image_path, left, top, width, height)` — embed image
-- For tables: `slide.shapes.add_table(rows, cols, left, top, width, height)`
-- All measurements: use `Inches()` or `Pt()` or raw `Emu` (1 inch = 914400 Emu)
-**Avoid:**
-- `pptx.chart` (complex, often renders poorly) — use coloured shapes instead
-- Embedded images from URLs — use only local files or draw shapes
----
-### reportlab (for Styles 4 and 6)
-```python
-from reportlab.pdfgen.canvas import Canvas
-from reportlab.lib.pagesizes import A4, landscape
-from reportlab.lib import colors
-from reportlab.lib.units import cm, mm
-W, H = landscape(A4)  # 841.89 x 595.28 points (29.7 x 21 cm landscape)
-c = Canvas("deck/TenderIQ_v4_modern_gradient.pdf", pagesize=landscape(A4))
-# Filled rectangle
-c.setFillColor(colors.HexColor("#0D1B2A"))
-c.rect(0, 0, W, H, fill=1, stroke=0)
-# Text
-c.setFillColor(colors.white)
-c.setFont("Helvetica-Bold", 48)
-c.drawCentredString(W/2, H/2, "TenderIQ")
-# Line
-c.setStrokeColor(colors.HexColor("#F0A500"))
-c.setLineWidth(3)
-c.line(2*cm, H - 3*cm, W - 2*cm, H - 3*cm)
-# New page
-c.showPage()
-# Save
-c.save()
-```
-**Key reportlab patterns:**
-- `c.rect(x, y, w, h, fill=1, stroke=0)` — filled rectangle, no border
-- `c.roundRect(x, y, w, h, radius, fill=1, stroke=0)` — rounded rectangle
-- `c.drawCentredString(x, y, text)` — centred text at point
-- `c.drawString(x, y, text)` — left-aligned text
-- `c.drawRightString(x, y, text)` — right-aligned text
-- `c.setFont("Helvetica-Bold", size)` — font (built-in: Helvetica, Times-Roman, Courier)
-- `c.arc(x1, y1, x2, y2, startAng, extent)` — arc (for circular indicators)
-- `c.line(x1, y1, x2, y2)` — line
-- `c.showPage()` — new slide/page
-- **Coordinate system:** origin is bottom-left; y increases upward
-  - To position from top: use `H - y_from_top`
-**Text wrapping in reportlab:**
-```python
-from reportlab.platypus import Paragraph
-from reportlab.lib.styles import ParagraphStyle
-style = ParagraphStyle('body', fontSize=12, leading=16, textColor=colors.white)
-p = Paragraph("Long text that wraps automatically.", style)
-p.wrapOn(c, width, height)
-p.drawOn(c, x, y)
-```
----
-## 8. Quality Checklist
-Before saving each variant, verify:
-- [ ] All 8 slides are present with content from section 4
-- [ ] Title is readable in 2 seconds on slide 1
-- [ ] The three pain point numbers are prominent on slide 2
-- [ ] The safety rule callout is visually distinct on slide 6
-- [ ] The three bidder outcomes are clearly colour-coded on slide 7
-- [ ] No slide has walls of text — maximum ~6 bullet points per slide
-- [ ] Font sizes: headings 24–32pt, body 14–16pt minimum (readable when projected)
-- [ ] Consistent margin — at least 1 inch (reportlab: 2.5cm) from all edges
-- [ ] Consistent colour palette within each variant (no accidental colour mixing)
-- [ ] File saves without error and opens cleanly
----
-## 9. Output Summary
-| File | Format | Style | Tool |
-|------|--------|-------|------|
-| `deck/TenderIQ_v1_dark_professional.pptx` | PPTX | Dark navy + gold | python-pptx |
-| `deck/TenderIQ_v2_clean_minimal.pptx` | PPTX | White + blue, minimal | python-pptx |
-| `deck/TenderIQ_v3_government_official.pptx` | PPTX | Government blue + saffron | python-pptx |
-| `deck/TenderIQ_v4_modern_gradient.pdf` | PDF | Purple-blue gradient | reportlab |
-| `deck/TenderIQ_v5_data_forward.pptx` | PPTX | Charts + data viz | python-pptx |
-| `deck/TenderIQ_v6_infographic.pdf` | PDF | Large icons + numbers | reportlab |
-All output to `deck/`. Delete `TenderIQ_Pitch.pdf` (the old bad one) after creating the new files.

specs/00_skeleton.md DELETED Viewed

@@ -1,594 +0,0 @@
-# Spec 00 — Project Skeleton
-**Step:** 1 of 15
-**Time budget:** ~15 min
-**Checkpoint:** `streamlit run app.py` opens in the browser showing 5 named tabs and a sidebar with logo placeholder, project name, and connection status dot. No errors in the terminal.
----
-## Goal
-Create every file and directory that Step 2 onward will write into. All Python modules are stubs (importable but empty of logic). The running app must render without crashing.
----
-## Files to Create
-### Root-level files
-#### `requirements.txt`
-```
-streamlit==1.39.0
-openai==1.51.0
-pymupdf==1.24.10
-pytesseract==0.3.13
-Pillow==10.4.0
-numpy==1.26.4
-chromadb==0.5.5
-sentence-transformers==3.1.1
-pydantic==2.9.2
-python-dotenv==1.0.1
-reportlab==4.2.5
-pandas==2.2.3
-```
-#### `packages.txt`
-```
-tesseract-ocr
-poppler-utils
-```
-#### `.env.example`
-```
-DEEPSEEK_API_KEY=your_key_here
-```
-#### `.gitignore`
-```
-.env
-.chroma/
-audit.db
-__pycache__/
-*.pyc
-.ocr_cache/
-*.egg-info/
-dist/
-build/
-.DS_Store
-Thumbs.db
-```
-#### `app.py` — Streamlit entry point (stub)
-Exact stub content:
-```python
-import streamlit as st
-from ui.tab_overview import render as render_overview
-from ui.tab_tender import render as render_tender
-from ui.tab_bidders import render as render_bidders
-from ui.tab_review import render as render_review
-from ui.tab_audit import render as render_audit
-st.set_page_config(
-    page_title="TenderIQ",
-    page_icon="⚖️",
-    layout="wide",
-)
-# ── Sidebar ──────────────────────────────────────────────────────────────────
-with st.sidebar:
-    st.markdown("## ⚖️ TenderIQ")
-    st.caption("Explainable AI for Tender Evaluation")
-    st.divider()
-    # Connection status — placeholder until core/llm_client.py is wired
-    st.markdown("🔴 **DeepSeek:** not connected")
-    st.divider()
-    if st.button("Reset Session", use_container_width=True):
-        for key in list(st.session_state.keys()):
-            del st.session_state[key]
-        st.rerun()
-# ── Tabs ─────────────────────────────────────────────────────────────────────
-tab1, tab2, tab3, tab4, tab5 = st.tabs([
-    "Overview",
-    "Tender Analysis",
-    "Bidder Evaluation",
-    "Human Review",
-    "Audit Log",
-])
-with tab1:
-    render_overview()
-with tab2:
-    render_tender()
-with tab3:
-    render_bidders()
-with tab4:
-    render_review()
-with tab5:
-    render_audit()
-```
----
-### `core/` package — all stubs
-Every file in `core/` must be importable and expose the names that `app.py` or other modules reference at import time. No logic yet — just `pass` stubs and placeholder class/function signatures.
-#### `core/__init__.py`
-Empty.
-#### `core/config.py`
-```python
-import os
-from pathlib import Path
-from dotenv import load_dotenv
-load_dotenv()
-DEEPSEEK_API_KEY: str | None = os.getenv("DEEPSEEK_API_KEY")
-DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"
-MODEL_NAME = "deepseek-chat"
-MODEL_VERSION = f"{MODEL_NAME}@2026-05-07"
-CONFIDENCE_HIGH = 0.80
-CONFIDENCE_REVIEW = 0.55
-OCR_TESSERACT_MIN_CONF = 0.65
-BASE_DIR = Path(__file__).resolve().parent.parent
-DATA_DIR = BASE_DIR / "data"
-CHROMA_DIR = str(BASE_DIR / ".chroma")
-AUDIT_DB = str(BASE_DIR / "audit.db")
-PRECOMPUTED_DIR = DATA_DIR / "precomputed"
-OCR_CACHE_DIR = BASE_DIR / ".ocr_cache"
-```
-#### `core/schemas.py`
-```python
-from __future__ import annotations
-from typing import Literal, Optional
-from pydantic import BaseModel, Field
-import uuid
-class Rule(BaseModel):
-    type: Literal["numeric_threshold", "count_threshold", "certification_present", "document_present"]
-    field: str
-    operator: Literal[">=", "<=", "==", "exists"]
-    value: float | int | None = None
-    unit: str | None = None
-class Criterion(BaseModel):
-    id: str
-    title: str
-    category: Literal["financial", "technical", "compliance"]
-    mandatory: bool
-    description: str
-    rule: Rule
-    query_hints: list[str]
-    source_page: int
-    source_clause: str
-class Evidence(BaseModel):
-    bidder_id: str
-    doc_name: str
-    page: int
-    text: str
-    source_type: Literal["text_pdf", "tesseract", "vision_llm"]
-    ocr_confidence: float | None = None
-class Source(BaseModel):
-    doc_name: str
-    page: int
-    snippet: str
-    source_type: Literal["text_pdf", "tesseract", "vision_llm"]
-class Verdict(BaseModel):
-    verdict_id: str = Field(default_factory=lambda: f"V-{uuid.uuid4().hex[:8]}")
-    bidder_id: str
-    criterion_id: str
-    verdict: Literal["eligible", "not_eligible", "needs_review"]
-    extracted_value: str | None = None
-    normalized_value: float | int | None = None
-    source: Source | None = None
-    llm_confidence: float = 0.0
-    ocr_confidence: float | None = None
-    combined_confidence: float = 0.0
-    reason: str = ""
-    model_version: str = ""
-    timestamp: str = ""
-    review_status: Literal["pending", "approved", "edited", "rejected"] = "pending"
-class AuditEntry(BaseModel):
-    id: int | None = None
-    ts: str
-    action: str
-    actor: str
-    model_version: str | None = None
-    bidder_id: str | None = None
-    criterion_id: str | None = None
-    payload_json: str | None = None
-```
-#### `core/prompts.py`
-```python
-EXTRACT_CRITERIA_PROMPT_SYSTEM = """\
-You are an expert in Indian government tender analysis (CRPF context). Your job is to extract \
-eligibility criteria from a tender document and return them as STRICT JSON. Never invent criteria \
-not present in the text. Classify each criterion as mandatory or optional based on cue words: \
-"shall", "must", "mandatory", "required", "minimum" → mandatory; "preferred", "desirable", \
-"may", "optionally" → optional. For each criterion, generate 3–5 short noun-phrase query_hints \
-that an evaluator would search for in bidder documents.\
-"""
-EVALUATE_CRITERION_PROMPT_SYSTEM = """\
-You are a procurement evaluator. Given ONE criterion and a list of retrieved evidence chunks from \
-a bidder's documents, decide eligible / not_eligible / needs_review. Always cite the strongest \
-single source. NEVER guess values not present in the evidence. If evidence is missing or \
-ambiguous, return needs_review with reason. Output STRICT JSON.\
-"""
-VISION_OCR_PROMPT_SYSTEM = """\
-You are an OCR engine for Indian government procurement documents. Transcribe the image text \
-faithfully, preserving numeric values, dates, certificate IDs, and tabular structure (use \
-markdown tables). Do NOT summarize, interpret, or omit anything. Output transcribed text only — \
-no commentary.\
-"""
-VISION_OCR_USER = (
-    "Transcribe this document page completely. Pay special attention to numeric values like "
-    "turnover figures (INR / Crore / Lakh), dates, and registration numbers."
-)
-```
-#### `core/llm_client.py`
-```python
-from pathlib import Path
-class LLMUnavailable(Exception):
-    pass
-class LLM:
-    def __init__(self, api_key: str | None = None):
-        pass
-    def chat_json(self, system: str, user: str, max_retries: int = 2) -> dict:
-        raise NotImplementedError
-    def chat_vision(
-        self,
-        system: str,
-        user_text: str,
-        image: bytes | str | Path,
-        max_retries: int = 2,
-    ) -> str:
-        raise NotImplementedError
-```
-#### `core/pdf_utils.py`
-```python
-from pathlib import Path
-import PIL.Image
-def extract_pages(path: Path) -> list[dict]:
-    raise NotImplementedError
-def is_text_pdf(path: Path) -> bool:
-    raise NotImplementedError
-def render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image.Image:
-    raise NotImplementedError
-```
-#### `core/ocr_pipeline.py`
-```python
-from pathlib import Path
-class ExtractedPage:
-    page: int
-    text: str
-    source_type: str  # "text_pdf" | "tesseract" | "vision_llm"
-    confidence: float
-    raw_tier_results: dict
-def extract_document(file_path: Path) -> list[ExtractedPage]:
-    raise NotImplementedError
-```
-#### `core/chunker.py`
-```python
-from core.ocr_pipeline import ExtractedPage
-def chunk_tender(pages: list[dict], tender_id: str) -> list[dict]:
-    raise NotImplementedError
-def chunk_bidder(
-    pages: list[ExtractedPage], bidder_id: str, doc_name: str
-) -> list[dict]:
-    raise NotImplementedError
-```
-#### `core/vectorstore.py`
-```python
-def get_client():
-    raise NotImplementedError
-def get_collection(name: str):
-    raise NotImplementedError
-def add_chunks(collection, chunks: list[dict], metadatas: list[dict]) -> None:
-    raise NotImplementedError
-def query(
-    collection, text: str, k: int = 4, where: dict | None = None
-) -> list[dict]:
-    raise NotImplementedError
-```
-#### `core/criteria_extractor.py`
-```python
-from pathlib import Path
-from core.schemas import Criterion
-def extract_criteria(tender_pdf_path: Path) -> list[Criterion]:
-    raise NotImplementedError
-```
-#### `core/bidder_processor.py`
-```python
-from pathlib import Path
-from core.schemas import Criterion, Evidence
-def process_bidder(bidder_id: str, files: list[Path]) -> None:
-    raise NotImplementedError
-def gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]:
-    raise NotImplementedError
-```
-#### `core/evaluator.py`
-```python
-from core.schemas import Criterion, Verdict
-def evaluate(bidder_id: str, criterion: Criterion) -> Verdict:
-    raise NotImplementedError
-def evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]:
-    raise NotImplementedError
-```
-#### `core/audit.py`
-```python
-def log(action: str, actor: str = "system", **fields) -> int:
-    raise NotImplementedError
-def query(filters: dict | None = None) -> list[dict]:
-    raise NotImplementedError
-```
-#### `core/fallback.py`
-```python
-from core.schemas import Criterion, Verdict
-def load_criteria() -> list[Criterion]:
-    raise NotImplementedError
-def load_evaluation(bidder_id: str, criterion_id: str) -> Verdict:
-    raise NotImplementedError
-```
----
-### `ui/` package — all stubs
-Each tab module exports a single `render()` function that renders a placeholder heading. No logic.
-#### `ui/__init__.py`
-Empty.
-#### `ui/tab_overview.py`
-```python
-import streamlit as st
-def render() -> None:
-    st.header("Overview")
-    st.info("Coming soon — architecture diagram, KPIs, and demo CTA.")
-```
-#### `ui/tab_tender.py`
-```python
-import streamlit as st
-def render() -> None:
-    st.header("Tender Analysis")
-    st.info("Coming soon — upload tender and extract eligibility criteria.")
-```
-#### `ui/tab_bidders.py`
-```python
-import streamlit as st
-def render() -> None:
-    st.header("Bidder Evaluation")
-    st.info("Coming soon — per-bidder, per-criterion verdict table.")
-```
-#### `ui/tab_review.py`
-```python
-import streamlit as st
-def render() -> None:
-    st.header("Human Review Queue")
-    st.info("Coming soon — approve / edit / reject flagged verdicts.")
-```
-#### `ui/tab_audit.py`
-```python
-import streamlit as st
-def render() -> None:
-    st.header("Audit Log")
-    st.info("Coming soon — sortable audit log with CSV export.")
-```
-#### `ui/components.py`
-```python
-# Shared UI widgets — implemented incrementally as Tab 3 and Tab 4 need them.
-```
----
-### `data/` directory structure (empty folders only)
-```
-data/
-  tender/
-  bidders/
-    bidder_a/
-    bidder_b/
-    bidder_c/
-  precomputed/
-```
-No files yet — Step 2 (mock data generation) populates these.
----
-### `scripts/` directory (empty stubs)
-#### `scripts/generate_mock_data.py`
-```python
-"""Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
-```
-#### `scripts/precompute_results.py`
-```python
-"""Step 11 — runs the full pipeline and writes data/precomputed/*.json."""
-```
-#### `scripts/smoke_test.py`
-```python
-"""Step 13 — programmatic end-to-end check; exits 0 on success."""
-```
----
-### `assets/` directory (empty, for later)
-```
-assets/
-  screenshots/
-```
----
-### `deck/` directory (empty, for later)
-```
-deck/
-```
----
-## Directory Tree After This Step
-```
-TenderIQ/
-├── app.py
-├── requirements.txt
-├── packages.txt
-├── .env.example
-├── .gitignore
-├── specs/
-│   └── 00_skeleton.md          ← this file
-├── core/
-│   ├── __init__.py
-│   ├── config.py
-│   ├── schemas.py
-│   ├── prompts.py
-│   ├── llm_client.py
-│   ├── pdf_utils.py
-│   ├── ocr_pipeline.py
-│   ├── chunker.py
-│   ├── vectorstore.py
-│   ├── criteria_extractor.py
-│   ├── bidder_processor.py
-│   ├── evaluator.py
-│   ├── audit.py
-│   └── fallback.py
-├── ui/
-│   ├── __init__.py
-│   ├── tab_overview.py
-│   ├── tab_tender.py
-│   ├── tab_bidders.py
-│   ├── tab_review.py
-│   ├── tab_audit.py
-│   └── components.py
-├── data/
-│   ├── tender/
-│   ├── bidders/
-│   │   ├── bidder_a/
-│   │   ├── bidder_b/
-│   │   └── bidder_c/
-│   └── precomputed/
-├── scripts/
-│   ├── generate_mock_data.py
-│   ├── precompute_results.py
-│   └── smoke_test.py
-├── assets/
-│   └── screenshots/
-└── deck/
-```
-Runtime artifacts (gitignored, not created here): `.env`, `.chroma/`, `audit.db`, `.ocr_cache/`.
----
-## Acceptance Criteria
-1. `python -c "import app"` executes without `ImportError` (all stubs importable).
-2. `streamlit run app.py` opens in the browser without a Python traceback.
-3. Five tabs are visible: Overview, Tender Analysis, Bidder Evaluation, Human Review, Audit Log.
-4. Sidebar shows "⚖️ TenderIQ", a caption, a red connection dot placeholder, and a "Reset Session" button.
-5. Each tab body shows an `st.info(...)` placeholder — no blank white screens.
-6. `python -c "from core import config, schemas, prompts"` runs without error.
----
-## What This Step Does NOT Do
-- No logic implemented in any `core/` module.
-- No Streamlit secrets or `.env` required to pass the checkpoint.
-- No data files generated (Step 2 does that).
-- No pip install triggered (assumed the environment is set up separately).

specs/01_config_and_schemas.md DELETED Viewed

@@ -1,145 +0,0 @@
-# Spec 01 — Config, Schemas, and Prompts
-**Step:** 3 of 15
-**Time budget:** ~25 min
-**Checkpoint:** `python -c "from core import config, schemas, prompts"` runs without error. All Pydantic models validate sample JSON correctly.
----
-## Goal
-Finalize `core/config.py`, `core/schemas.py`, and `core/prompts.py` with full working implementations (the skeleton stubs already have the correct content — this step validates and documents them).
----
-## `core/config.py`
-Loads environment variables. All values are module-level constants.
-| Constant | Type | Value / Source |
-|---|---|---|
-| `DEEPSEEK_API_KEY` | `str | None` | `os.getenv("DEEPSEEK_API_KEY")` |
-| `DEEPSEEK_BASE_URL` | `str` | `"https://api.deepseek.com/v1"` |
-| `MODEL_NAME` | `str` | `"deepseek-chat"` |
-| `MODEL_VERSION` | `str` | `f"{MODEL_NAME}@2026-05-07"` |
-| `CONFIDENCE_HIGH` | `float` | `0.80` |
-| `CONFIDENCE_REVIEW` | `float` | `0.55` |
-| `OCR_TESSERACT_MIN_CONF` | `float` | `0.65` |
-| `BASE_DIR` | `Path` | parent of `core/` |
-| `DATA_DIR` | `Path` | `BASE_DIR / "data"` |
-| `CHROMA_DIR` | `str` | `str(BASE_DIR / ".chroma")` |
-| `AUDIT_DB` | `str` | `str(BASE_DIR / "audit.db")` |
-| `PRECOMPUTED_DIR` | `Path` | `DATA_DIR / "precomputed"` |
-| `OCR_CACHE_DIR` | `Path` | `BASE_DIR / ".ocr_cache"` |
-`load_dotenv()` is called at module level so `.env` is sourced before `os.getenv`.
----
-## `core/schemas.py`
-Pydantic v2 models. All fields have type annotations. Use `from __future__ import annotations`.
-### `Rule`
-```python
-class Rule(BaseModel):
-    type: Literal["numeric_threshold", "count_threshold", "certification_present", "document_present"]
-    field: str
-    operator: Literal[">=", "<=", "==", "exists"]
-    value: float | int | None = None
-    unit: str | None = None
-```
-### `Criterion`
-```python
-class Criterion(BaseModel):
-    id: str
-    title: str
-    category: Literal["financial", "technical", "compliance"]
-    mandatory: bool
-    description: str
-    rule: Rule
-    query_hints: list[str]
-    source_page: int
-    source_clause: str
-```
-### `Evidence`
-```python
-class Evidence(BaseModel):
-    bidder_id: str
-    doc_name: str
-    page: int
-    text: str
-    source_type: Literal["text_pdf", "tesseract", "vision_llm"]
-    ocr_confidence: float | None = None
-```
-### `Source`
-```python
-class Source(BaseModel):
-    doc_name: str
-    page: int
-    snippet: str
-    source_type: Literal["text_pdf", "tesseract", "vision_llm"]
-```
-### `Verdict`
-```python
-class Verdict(BaseModel):
-    verdict_id: str = Field(default_factory=lambda: f"V-{uuid.uuid4().hex[:8]}")
-    bidder_id: str
-    criterion_id: str
-    verdict: Literal["eligible", "not_eligible", "needs_review"]
-    extracted_value: str | None = None
-    normalized_value: float | int | None = None
-    source: Source | None = None
-    llm_confidence: float = 0.0
-    ocr_confidence: float | None = None
-    combined_confidence: float = 0.0
-    reason: str = ""
-    model_version: str = ""
-    timestamp: str = ""
-    review_status: Literal["pending", "approved", "edited", "rejected"] = "pending"
-```
-### `AuditEntry`
-```python
-class AuditEntry(BaseModel):
-    id: int | None = None
-    ts: str
-    action: str
-    actor: str
-    model_version: str | None = None
-    bidder_id: str | None = None
-    criterion_id: str | None = None
-    payload_json: str | None = None
-```
----
-## `core/prompts.py`
-Three string constants already defined in the skeleton — no changes needed.
-- `EXTRACT_CRITERIA_PROMPT_SYSTEM`
-- `EVALUATE_CRITERION_PROMPT_SYSTEM`
-- `VISION_OCR_PROMPT_SYSTEM`
-- `VISION_OCR_USER`
----
-## Acceptance Criteria
-1. `python -c "from core import config, schemas, prompts"` exits 0.
-2. `python -c "from core.schemas import Criterion, Verdict, Evidence, AuditEntry; print('OK')"` prints OK.
-3. Sample Criterion JSON validates without error:
-   ```python
-   from core.schemas import Criterion
-   c = Criterion(**{"id":"C1","title":"Turnover","category":"financial",
-     "mandatory":True,"description":"INR 5Cr","rule":{"type":"numeric_threshold",
-     "field":"turnover","operator":">=","value":50000000,"unit":"INR"},
-     "query_hints":["turnover"],"source_page":3,"source_clause":"3.2(a)"})
-   assert c.mandatory is True
-   ```
-4. `config.MODEL_VERSION` contains `"deepseek-chat@2026-05-07"`.

specs/02_llm_client.md DELETED Viewed

@@ -1,101 +0,0 @@
-# Spec 02 — LLM Client
-**Step:** 4 of 15
-**Time budget:** ~25 min
-**Checkpoint:** `LLM().chat_json(system, user)` returns a dict when the API key is valid; raises `LLMUnavailable` when the key is missing.
----
-## Goal
-Implement `core/llm_client.py` — a thin wrapper around the OpenAI Python SDK pointed at the DeepSeek API. Provides `chat_json` (JSON-mode responses) and `chat_vision` (multimodal image input). Both methods retry on transient failures and raise `LLMUnavailable` after `max_retries`.
----
-## Dependencies
-- `openai` Python SDK (OpenAI-compatible, pointed at DeepSeek base URL)
-- `core.config` for `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `MODEL_NAME`, `MODEL_VERSION`
-- `core.prompts` for prompt constants (used by callers, not by this module directly)
----
-## Class: `LLMUnavailable`
-```python
-class LLMUnavailable(Exception):
-    pass
-```
-Raised whenever the LLM call cannot be completed after all retries. Callers should catch this and route to `fallback.py`.
----
-## Class: `LLM`
-### `__init__(self, api_key: str | None = None)`
-- If `api_key` is `None`, use `config.DEEPSEEK_API_KEY`.
-- If the resolved key is `None` or empty: do NOT raise immediately — defer to call time so the app can start without a key (precomputed mode).
-- Create an `openai.OpenAI(api_key=key, base_url=DEEPSEEK_BASE_URL)` client and store as `self._client`.
-### `chat_json(self, system: str, user: str, max_retries: int = 2) -> dict`
-Calls the chat completions API with `response_format={"type": "json_object"}`, `temperature=0`.
-Messages: `[{"role": "system", "content": system}, {"role": "user", "content": user}]`
-Retry logic:
-1. Try the API call.
-2. On success: parse `response.choices[0].message.content` as JSON. If `json.loads` fails, retry once with a stricter system postscript `" Respond ONLY with valid JSON, no prose."`. If it fails again, raise `LLMUnavailable("Malformed JSON after retries")`.
-3. On `openai.APIStatusError` (5xx) or `openai.APIConnectionError`: exponential backoff (`2 ** attempt` seconds, max 2 attempts), then raise `LLMUnavailable`.
-4. On `openai.AuthenticationError` (401): raise `LLMUnavailable("Invalid API key")` immediately (no retry).
-5. If `api_key` is None/empty at call time: raise `LLMUnavailable("No API key configured")`.
-Returns `dict`.
-### `chat_vision(self, system: str, user_text: str, image: bytes | str | Path, max_retries: int = 2) -> str`
-Sends a multimodal message using the OpenAI vision format.
-Image encoding:
-- If `image` is `bytes`: base64-encode directly.
-- If `image` is `Path` or `str`: read the file as bytes, then base64-encode.
-- Build data URI: `f"data:image/png;base64,{b64_str}"`.
-Message format:
-```python
-[
-  {"role": "system", "content": system},
-  {"role": "user", "content": [
-    {"type": "text", "text": user_text},
-    {"type": "image_url", "image_url": {"url": data_uri}},
-  ]},
-]
-```
-Call at `temperature=0`, no `response_format` (vision endpoint returns plain text).
-Retry logic: same as `chat_json` but on content errors: just retry with same prompt. Returns `response.choices[0].message.content` as string.
-On any failure after retries: raise `LLMUnavailable`.
----
-## Error handling summary
-| Condition | Behaviour |
-|---|---|
-| Missing/empty API key | `LLMUnavailable("No API key configured")` |
-| 401 AuthenticationError | `LLMUnavailable("Invalid API key")` |
-| 5xx / ConnectionError | Retry with backoff, then `LLMUnavailable` |
-| Malformed JSON (chat_json) | Retry once with stricter prompt, then `LLMUnavailable` |
----
-## Acceptance Criteria
-1. `from core.llm_client import LLM, LLMUnavailable` imports cleanly.
-2. `LLM(api_key=None)` with no `.env` → calling `chat_json(...)` raises `LLMUnavailable` (not an unhandled exception).
-3. With a valid key: `LLM().chat_json("respond with valid json", '{"ok": true}')` returns `{"ok": True}` (or similar).
-4. `LLMUnavailable` is a subclass of `Exception`.

specs/03_pdf_utils_and_chunker.md DELETED Viewed

@@ -1,80 +0,0 @@
-# Spec 03 — PDF Utils and Chunker
-**Step:** 5 of 15
-**Time budget:** ~15 min
----
-## Goal
-Implement `core/pdf_utils.py` (PyMuPDF text extraction and page rendering) and `core/chunker.py` (text → chunks with metadata).
----
-## `core/pdf_utils.py`
-### `extract_pages(path: Path) -> list[dict]`
-- Opens the PDF with `fitz.open(str(path))`.
-- For each page `i`: extracts text via `page.get_text("text")`.
-- Returns `[{"page": i+1, "text": text}, ...]` (1-indexed pages).
-### `is_text_pdf(path: Path) -> bool`
-- Opens the PDF.
-- Computes average characters per page across all pages.
-- Returns `True` if average ≥ 50 characters per page (heuristic for typed PDF vs scanned blank pages).
-### `render_page_to_image(path: Path, page_no: int, dpi: int = 200) -> PIL.Image.Image`
-- Opens the PDF.
-- Gets page at index `page_no - 1` (0-indexed).
-- Creates `fitz.Matrix(dpi/72, dpi/72)` and renders via `page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)`.
-- Converts pixmap to PIL Image via `Image.frombytes("RGB", [pix.width, pix.height], pix.samples)`.
-- Returns the PIL Image.
----
-## `core/chunker.py`
-### `chunk_tender(pages: list[dict], tender_id: str) -> list[dict]`
-Input: list of `{"page": int, "text": str}` dicts.
-Strategy:
-- Join page text. Split on clause headings detected by regex `r'^\d+(\.\d+)*\s+'` (multiline).
-- Each chunk: up to ~500 tokens (~2000 chars). If a section is longer, split on `\n\n` boundaries.
-- Each chunk dict: `{"text": str, "tender_id": str, "page": int, "chunk_id": str}`.
-- `chunk_id` = `f"{tender_id}_p{page}_c{i}"`.
-Simpler implementation (sufficient for 5-page mock tender):
-- One chunk per page section: for each page, if text > 2000 chars split into ~2000-char pieces; else one chunk.
-### `chunk_bidder(pages: list[ExtractedPage], bidder_id: str, doc_name: str) -> list[dict]`
-Input: list of `ExtractedPage` objects.
-Strategy: one chunk per page.
-Each chunk dict:
-```python
-{
-    "text": page.text,
-    "bidder_id": bidder_id,
-    "doc_name": doc_name,
-    "page": page.page,
-    "source_type": page.source_type,
-    "ocr_confidence": page.confidence,
-    "chunk_id": f"{bidder_id}_{doc_name}_p{page.page}",
-}
-```
----
-## Acceptance Criteria
-1. `extract_pages(Path("data/tender/crpf_construction_tender.pdf"))` returns a list of dicts with non-empty text on most pages.
-2. `is_text_pdf(Path("data/tender/crpf_construction_tender.pdf"))` returns `True`.
-3. `render_page_to_image(Path("data/tender/crpf_construction_tender.pdf"), 1)` returns a PIL Image with width > 0.
-4. `chunk_tender(pages, "tender_001")` returns a non-empty list of dicts each having a "text" key.
-5. Each bidder chunk has all required metadata keys.

specs/04_ocr_pipeline.md DELETED Viewed

@@ -1,97 +0,0 @@
-# Spec 04 — OCR Pipeline
-**Step:** 7 of 15
-**Time budget:** ~30 min
-**Checkpoint:** `extract_document(Path("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a list with `source_type` reflecting the OCR tier used.
----
-## Goal
-Implement `core/ocr_pipeline.py` — the three-tier OCR orchestrator. For each document/image, determines the best extraction method: PyMuPDF text (Tier 1), Tesseract (Tier 2), or DeepSeek Vision LLM (Tier 3). Caches results per file to avoid re-OCR on re-runs.
----
-## `ExtractedPage` dataclass
-```python
-@dataclasses.dataclass
-class ExtractedPage:
-    page: int
-    text: str
-    source_type: str  # "text_pdf" | "tesseract" | "vision_llm"
-    confidence: float
-    raw_tier_results: dict
-```
----
-## `extract_document(file_path: Path) -> list[ExtractedPage]`
-### Cache check
-- Compute `file_hash = hashlib.md5(file_path.read_bytes()).hexdigest()`.
-- Cache path: `OCR_CACHE_DIR / f"{file_hash}.json"`.
-- If cache exists: deserialize and return `list[ExtractedPage]`.
-### Routing
-**Case A — Image file (PNG/JPG/JPEG/BMP/TIFF):**
-- Treat as single page (page=1).
-- Go directly to Tier 2 (Tesseract).
-- If Tier 2 confidence < `OCR_TESSERACT_MIN_CONF`: try Tier 3.
-**Case B — PDF file:**
-- Call `pdf_utils.is_text_pdf(file_path)`.
-- If `True`: Tier 1 — call `pdf_utils.extract_pages(file_path)`, set `source_type="text_pdf"`, `confidence=1.0`.
-- If `False`: for each page, render to image via `pdf_utils.render_page_to_image`, then Tier 2.
-### Tier 2 — Tesseract
-```python
-import pytesseract
-data = pytesseract.image_to_data(pil_image, output_type=pytesseract.Output.DATAFRAME)
-# Filter rows with conf != -1
-valid = data[data["conf"] != -1]
-mean_conf = float(valid["conf"].mean()) / 100 if len(valid) > 0 else 0.0
-text = " ".join(str(w) for w in valid["text"] if str(w).strip())
-```
-If `mean_conf < OCR_TESSERACT_MIN_CONF` OR `len(text.strip()) < 20`: attempt Tier 3.
-### Tier 3 — DeepSeek Vision LLM
-- Convert PIL Image to PNG bytes via `io.BytesIO`.
-- Call `LLM().chat_vision(VISION_OCR_PROMPT_SYSTEM, VISION_OCR_USER, image_bytes)`.
-- On success: `source_type="vision_llm"`, `confidence=0.95`.
-- Log `vision_ocr_invoked` audit entry.
-- On `LLMUnavailable`: keep Tier 2 result with its `confidence` (will trigger `needs_review` downstream).
-### Cache write
-After processing all pages, serialize to JSON and save to cache file.
----
-## Serialization format for cache
-```json
-[
-  {
-    "page": 1,
-    "text": "...",
-    "source_type": "text_pdf",
-    "confidence": 1.0,
-    "raw_tier_results": {"tesseract_conf": null, "vision_used": false}
-  }
-]
-```
----
-## Acceptance Criteria
-1. `extract_document(Path("data/bidders/bidder_a/audited_financials.pdf"))` returns pages with `source_type="text_pdf"`.
-2. `extract_document(Path("data/bidders/bidder_c/turnover_certificate_scan.png"))` — if Tesseract is available and confidence < 0.65, attempts vision LLM (or returns tesseract result with low confidence when LLM unavailable).
-3. Second call to `extract_document` on same file returns cached result (no re-processing).
-4. Each returned `ExtractedPage` has non-empty `text`.

specs/06_vectorstore_and_bidder_processor.md DELETED Viewed

@@ -1,97 +0,0 @@
-# Spec 06 — Vector Store and Bidder Processor
-**Step:** 8 of 15
-**Time budget:** ~25 min
-**Checkpoint:** `process_bidder("bidder_a", ...)` indexes all docs; `gather_evidence("bidder_a", turnover_criterion)` returns chunks mentioning the turnover figure.
----
-## Goal
-Implement `core/vectorstore.py` (ChromaDB persistent client helpers) and `core/bidder_processor.py` (document ingestion + evidence retrieval per criterion).
----
-## `core/vectorstore.py`
-Uses ChromaDB persistent client with `sentence-transformers/all-MiniLM-L6-v2` embeddings.
-### `get_client()`
-```python
-@st.cache_resource
-def get_client():
-    import chromadb
-    from core.config import CHROMA_DIR
-    return chromadb.PersistentClient(path=CHROMA_DIR)
-```
-### `get_collection(name: str)`
-```python
-def get_collection(name: str):
-    client = get_client()
-    return client.get_or_create_collection(
-        name=name,
-        metadata={"hnsw:space": "cosine"},
-    )
-```
-Note: ChromaDB default embedding function uses `all-MiniLM-L6-v2` (~80 MB, downloaded on first run).
-### `add_chunks(collection, chunks: list[dict], metadatas: list[dict]) -> None`
-- IDs: `hashlib.sha256(chunk["text"].encode()).hexdigest()[:16]` — deduplicates across reruns.
-- Calls `collection.upsert(documents=[c["text"] for c in chunks], ids=ids, metadatas=metadatas)`.
-### `query(collection, text: str, k: int = 4, where: dict | None = None) -> list[dict]`
-- Calls `collection.query(query_texts=[text], n_results=k, where=where)` (omit `where` if None).
-- Returns `[{"text": doc, "metadata": meta, "distance": dist}, ...]` from the first result set.
-- Handle the case where fewer than `k` documents are in the collection (ChromaDB raises if `n_results > len(collection)`).
----
-## `core/bidder_processor.py`
-### `process_bidder(bidder_id: str, files: list[Path]) -> None`
-For each file in `files`:
-1. `pages = ocr_pipeline.extract_document(file)`.
-2. `chunks = chunker.chunk_bidder(pages, bidder_id, file.name)`.
-3. Build metadatas list — one per chunk:
-   ```python
-   {"bidder_id": bidder_id, "doc_name": file.name,
-    "page": chunk["page"], "source_type": chunk["source_type"],
-    "ocr_confidence": chunk["ocr_confidence"]}
-   ```
-4. `collection = vectorstore.get_collection("bidder_chunks")`.
-5. `vectorstore.add_chunks(collection, chunks, metadatas)`.
-6. `audit.log("bidder_processed", bidder_id=bidder_id, doc_name=file.name, chunk_count=len(chunks))`.
-### `gather_evidence(bidder_id: str, criterion: Criterion, k: int = 4) -> list[Evidence]`
-1. Build query string: `f"{criterion.title} {' '.join(criterion.query_hints)}"`.
-2. `collection = vectorstore.get_collection("bidder_chunks")`.
-3. `results = vectorstore.query(collection, query, k=k, where={"bidder_id": bidder_id})`.
-4. Map each result to `Evidence`:
-   ```python
-   Evidence(
-       bidder_id=bidder_id,
-       doc_name=meta["doc_name"],
-       page=meta["page"],
-       text=result["text"],
-       source_type=meta["source_type"],
-       ocr_confidence=meta.get("ocr_confidence"),
-   )
-   ```
-5. Return list.
----
-## Acceptance Criteria
-1. `process_bidder("bidder_a", [path1, path2, ...])` completes without error and logs audit entries.
-2. `gather_evidence("bidder_a", c1_criterion)` returns at least 1 `Evidence` object.
-3. The strongest evidence for Bidder A's turnover mentions "6,20,00,000" or "INR".
-4. Calling `process_bidder` twice on the same files does not duplicate chunks (upsert).

specs/07_criteria_extractor.md DELETED Viewed

@@ -1,79 +0,0 @@
-# Spec 07 — Criteria Extractor
-**Step:** 6 of 15
-**Time budget:** ~30 min
-**Checkpoint:** Tab 2 in the running app shows 5 criteria extracted from the mock tender.
----
-## Goal
-Implement `core/criteria_extractor.py` and wire up `ui/tab_tender.py` to call it. On `LLMUnavailable`, fall back to `fallback.load_criteria()`. Cache result in `st.session_state["criteria"]`.
----
-## `core/criteria_extractor.py`
-### `extract_criteria(tender_pdf_path: Path) -> list[Criterion]`
-1. Call `pdf_utils.extract_pages(tender_pdf_path)` → list of `{"page": int, "text": str}`.
-2. Join pages: `tender_text = "\n\n--- PAGE {n} ---\n\n".join(p["text"] for p in pages)`.
-3. Build user prompt:
-   ```
-   {tender_text}
-   ---
-   Return JSON in this exact format:
-   {"criteria": [
-     {"id": "C1", "title": "...", "category": "financial|technical|compliance",
-      "mandatory": true|false, "description": "...",
-      "rule": {"type": "numeric_threshold|count_threshold|certification_present|document_present",
-               "field": "...", "operator": ">=|<=|==|exists", "value": null_or_number, "unit": null_or_string},
-      "query_hints": ["...", "..."],
-      "source_page": <int>, "source_clause": "..."},
-     ...
-   ]}
-   ```
-4. Call `llm.chat_json(EXTRACT_CRITERIA_PROMPT_SYSTEM, user_prompt)`.
-5. Parse `result["criteria"]` → validate each item as `Criterion(**item)`.
-6. Log `criteria_extracted` to audit with `payload_json=json.dumps({"count": len(criteria)})`.
-7. Return `list[Criterion]`.
-On `LLMUnavailable`:
-- Log `precomputed_fallback_used` to audit.
-- Set `st.session_state["fallback_active"] = True`.
-- Return `fallback.load_criteria()`.
-LLM singleton: use `@st.cache_resource` on a getter `_get_llm()` so the client is created once per Streamlit session.
----
-## `ui/tab_tender.py`
-Renders the Tender Analysis tab. Replaces the stub.
-Layout:
-1. `st.header("Tender Analysis")`
-2. File uploader: `uploaded = st.file_uploader("Upload tender PDF", type=["pdf"])`. If nothing uploaded, use the preloaded mock: `data/tender/crpf_construction_tender.pdf`.
-3. Show the filename being used.
-4. Button **"Extract Criteria (Live LLM)"**:
-   - Save uploaded bytes to a temp file (or use the mock path directly).
-   - Call `criteria_extractor.extract_criteria(path)`.
-   - Store in `st.session_state["criteria"]`.
-5. If `st.session_state.get("criteria")`:
-   - Show `st.success(f"Extracted {len(criteria)} criteria")`.
-   - For each criterion, render a card using `st.expander`:
-     - Title + mandatory/optional badge (🔴 Mandatory / 🟡 Optional).
-     - Category badge (color-coded: financial=blue, technical=green, compliance=orange).
-     - Description text.
-     - Source: page + clause.
-     - Rule details (type, operator, value, unit).
----
-## Acceptance Criteria
-1. `extract_criteria(Path("data/tender/crpf_construction_tender.pdf"))` returns a list of 5 `Criterion` objects (when LLM is available) or the precomputed fallback (when not).
-2. Tab 2 renders without error in both modes.
-3. Each extracted criterion shows title, mandatory status, category, and source clause.
-4. `st.session_state["criteria"]` is populated after the button is clicked.

specs/09_evaluator.md DELETED Viewed

@@ -1,134 +0,0 @@
-# Spec 09 — Evaluator
-**Step:** 9 of 15
-**Time budget:** ~25 min
-**Checkpoint:** `evaluate("bidder_a", c1)` returns eligible with high confidence; `evaluate("bidder_b", c1)` returns not_eligible.
----
-## Goal
-Implement `core/evaluator.py` — per-criterion verdict generation with combined confidence scoring and threshold-based safety rules.
----
-## `evaluate(bidder_id: str, criterion: Criterion) -> Verdict`
-### Step 1 — Gather evidence
-`evidence = bidder_processor.gather_evidence(bidder_id, criterion)`
-If empty: return immediately:
-```python
-Verdict(
-    bidder_id=bidder_id,
-    criterion_id=criterion.id,
-    verdict="needs_review",
-    reason="No matching evidence found in submitted documents.",
-    llm_confidence=0.0,
-    combined_confidence=0.0,
-    model_version=MODEL_VERSION,
-    timestamp=now_iso(),
-)
-```
-Log `criterion_evaluated` with verdict=needs_review.
-### Step 2 — Build LLM prompt
-User message template:
-```
-CRITERION:
-{criterion.model_dump_json(indent=2)}
-RETRIEVED EVIDENCE (top-k chunks from bidder {bidder_id}):
-{json list of evidence dicts with doc_name, page, ocr_confidence, source_type, text}
-Return JSON:
-{
-  "verdict": "eligible" | "not_eligible" | "needs_review",
-  "extracted_value": "<short string as found in evidence>",
-  "normalized_value": <number or null>,
-  "chosen_source": {"doc_name": "...", "page": <int>, "snippet": "<= 200 chars", "source_type": "..."},
-  "llm_confidence": <0.0 to 1.0>,
-  "reason": "<one or two sentences>"
-}
-Rules:
-- If evidence directly contains a value satisfying the rule, verdict=eligible with high llm_confidence.
-- If evidence directly contradicts the rule, verdict=not_eligible.
-- If no relevant evidence retrieved, verdict=needs_review, llm_confidence<=0.4.
-- If the source is OCR with low confidence and the value is borderline, lean to needs_review.
-```
-### Step 3 — Call LLM
-`result = llm.chat_json(EVALUATE_CRITERION_PROMPT_SYSTEM, user_prompt)`
-On `LLMUnavailable`: return `fallback.load_evaluation(bidder_id, criterion.id)`.
-### Step 4 — Parse result
-Extract: `verdict`, `extracted_value`, `normalized_value`, `chosen_source`, `llm_confidence`, `reason`.
-Build `Source` object from `chosen_source`.
-### Step 5 — Combined confidence
-Find the evidence chunk matching `chosen_source` to get `ocr_confidence` and `source_type`:
-```python
-if source_type == "text_pdf":
-    combined = llm_confidence
-elif source_type == "vision_llm":
-    combined = 0.7 * llm_confidence + 0.3 * 0.95
-elif source_type == "tesseract":
-    tc = ocr_confidence if ocr_confidence and ocr_confidence >= 0 else 0.3
-    combined = 0.6 * llm_confidence + 0.4 * tc
-else:
-    combined = llm_confidence
-```
-### Step 6 — Apply threshold safety rules (in order)
-1. If LLM verdict is `needs_review` → keep.
-2. If `combined >= CONFIDENCE_HIGH` → keep LLM verdict.
-3. If `CONFIDENCE_REVIEW <= combined < CONFIDENCE_HIGH` AND verdict is `not_eligible` → downgrade to `needs_review` (NEVER silently disqualify at medium confidence).
-4. If `combined < CONFIDENCE_REVIEW` → force `needs_review`.
-### Step 7 — Build and return Verdict
-```python
-Verdict(
-    bidder_id=bidder_id,
-    criterion_id=criterion.id,
-    verdict=final_verdict,
-    extracted_value=extracted_value,
-    normalized_value=normalized_value,
-    source=source,
-    llm_confidence=llm_confidence,
-    ocr_confidence=ocr_confidence_from_best_evidence,
-    combined_confidence=combined,
-    reason=reason,
-    model_version=MODEL_VERSION,
-    timestamp=now_iso(),
-    review_status="pending",
-)
-```
-Log `criterion_evaluated` to audit.
----
-## `evaluate_bidder(bidder_id: str, criteria: list[Criterion]) -> list[Verdict]`
-Calls `evaluate(bidder_id, c)` for each criterion in sequence. Returns list.
----
-## Acceptance Criteria
-1. `evaluate("bidder_a", c1)` → `verdict="eligible"`, `combined_confidence >= 0.8` (or fallback eligible).
-2. `evaluate("bidder_b", c1)` → `verdict="not_eligible"` or `"needs_review"` (never silently eligible when turnover is below threshold).
-3. `evaluate_bidder("bidder_a", criteria)` returns 5 verdicts.
-4. All verdicts are `Verdict` instances with valid `review_status="pending"`.
-5. Audit log gains `criterion_evaluated` entries.

specs/10_audit_and_fallback.md DELETED Viewed

@@ -1,83 +0,0 @@
-# Spec 10 — Audit and Fallback
-**Step:** 10 of 15
-**Time budget:** ~20 min
----
-## Goal
-Document and finalize `core/audit.py` and `core/fallback.py`. Both were implemented early (Step 6) to unblock the criteria extractor. This spec records their contracts.
----
-## `core/audit.py`
-### SQLite schema
-```sql
-CREATE TABLE IF NOT EXISTS audit_log (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    ts TEXT NOT NULL,
-    action TEXT NOT NULL,
-    actor TEXT NOT NULL,
-    model_version TEXT,
-    bidder_id TEXT,
-    criterion_id TEXT,
-    payload_json TEXT
-);
-```
-Single file: `AUDIT_DB = str(BASE_DIR / "audit.db")`.
-### `log(action: str, actor: str = "system", **fields) -> int`
-- Writes one row. Returns the inserted `rowid`.
-- `ts`: UTC ISO timestamp.
-- `model_version`: from `fields` if present, else `config.MODEL_VERSION`.
-- `bidder_id`, `criterion_id`: extracted from `fields` if present.
-- Remaining `fields` → `payload_json = json.dumps(fields)`.
-### `query(filters: dict | None = None) -> list[dict]`
-- Returns rows from `audit_log` ordered by `id DESC`.
-- Supports filters: `bidder_id`, `action`, `date_from` (ts >=), `date_to` (ts <=).
-### Action vocabulary
-| Action | When logged |
-|---|---|
-| `criteria_extracted` | After successful LLM criteria extraction |
-| `bidder_processed` | After each document is indexed |
-| `criterion_evaluated` | After each (bidder, criterion) verdict |
-| `human_review_action` | When evaluator approves/edits/rejects a verdict |
-| `precomputed_fallback_used` | When LLM is unavailable and fallback fires |
-| `vision_ocr_invoked` | When Tier 3 vision LLM is called |
----
-## `core/fallback.py`
-### `load_criteria() -> list[Criterion]`
-- Reads `data/precomputed/criteria.json` if it exists, parses `{"criteria": [...]}`.
-- Falls back to `_HARDCODED_CRITERIA` (5 hardcoded criteria matching the mock tender exactly) if file is missing.
-### `load_evaluation(bidder_id: str, criterion_id: str) -> Verdict`
-- Reads `data/precomputed/eval_{bidder_id}.json` if it exists.
-- Finds the dict where `criterion_id` matches.
-- Falls back to a `needs_review` Verdict with reason "Pre-computed evaluation not available."
-### `_HARDCODED_CRITERIA`
-Five criteria matching the mock tender (C1–C5), with correct rules and query_hints. These are the ultimate safety net if `precompute_results.py` has not been run.
----
-## Acceptance Criteria
-1. `audit.log("test")` inserts a row; `audit.query()` returns it.
-2. `audit.query({"action": "criteria_extracted"})` filters correctly.
-3. `fallback.load_criteria()` returns 5 criteria even with no precomputed file.
-4. `fallback.load_evaluation("bidder_a", "C1")` returns a `Verdict` with `verdict_id` set.

specs/11_mock_data.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Spec 11 — Mock Data Generation
-**Step:** 2 of 15
-**Time budget:** ~25 min
-**Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence (~50–65%).
----
-## Goal
-`scripts/generate_mock_data.py` is a single deterministic script that produces:
-1. One tender PDF (`data/tender/crpf_construction_tender.pdf`)
-2. Five PDFs for Bidder A (clearly eligible)
-3. Five PDFs for Bidder B (clearly ineligible — turnover too low)
-4. Four PDFs + one noisy scan PNG for Bidder C (needs review)
-All files are entirely synthetic and self-contained — no external assets required. The script must run in under 30 seconds.
----
-## Dependencies
-- `reportlab` — PDF generation
-- `Pillow` — image manipulation
-- `numpy` — salt-and-pepper noise
----
-## Output Files
-```
-data/
-  tender/
-    crpf_construction_tender.pdf
-  bidders/
-    bidder_a/
-      company_profile.pdf
-      audited_financials.pdf
-      project_experience.pdf
-      gst_certificate.pdf
-      iso_9001.pdf
-    bidder_b/
-      company_profile.pdf
-      audited_financials.pdf
-      project_experience.pdf
-      gst_certificate.pdf
-      iso_9001.pdf
-    bidder_c/
-      company_profile.pdf
-      project_experience.pdf
-      gst_certificate.pdf
-      iso_9001.pdf
-      turnover_certificate_scan.png
-```
----
-## Tender PDF — `crpf_construction_tender.pdf`
-`reportlab` SimpleDocTemplate, 5–6 pages with formal government tender language.
-### Sections
-1. **Introduction** — "Central Reserve Police Force, Ministry of Home Affairs, Government of India. Tender for Construction of Residential Quarters."
-2. **Scope of Work** — brief description of construction project.
-3. **Eligibility Criteria** — Section 3.2, contains five criteria (see table below).
-4. **Submission Procedure** — dates, contact details.
-5. **Evaluation Methodology** — how bids will be scored.
-6. **Annexures** — supporting forms.
-### Five Criteria (exact text in Section 3.2)
-| ID | Clause | Verbatim Text | Mandatory | Category |
-|---|---|---|---|---|
-| C1 | 3.2(a) | "The bidder shall have a minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years (2022-23, 2023-24, 2024-25), as certified by a Chartered Accountant." | Yes | financial |
-| C2 | 3.2(b) | "The bidder must have successfully completed at least three (3) similar construction projects of value not less than INR 1 Crore each in the last five (5) financial years. Completion certificates from clients shall be submitted." | Yes | technical |
-| C3 | 3.2(c) | "The bidder shall possess a valid Goods and Services Tax (GST) registration certificate. The GSTIN must be active as on the date of submission." | Yes | compliance |
-| C4 | 3.2(d) | "The bidder shall hold a valid ISO 9001:2015 Quality Management System certification issued by an accredited certification body, valid as on the date of bid submission." | Yes | compliance |
-| C5 | 3.2(e) | "Preferably, the bidder may have prior experience with construction or maintenance of paramilitary or defence infrastructure. This is a desirable criterion and shall not affect mandatory eligibility." | No | technical |
-C5 uses "preferably" and "desirable" → tests the mandatory-vs-optional classifier.
----
-## Bidder A — Clearly Eligible
-### `company_profile.pdf`
-- Company: "Apex Constructions Pvt. Ltd."
-- GSTIN: 27AABCA1234F1Z5
-- Registered: 2010
-- ISO 9001:2015 certified: Yes
-### `audited_financials.pdf`
-- FY 2022-23: Annual Turnover INR 5,80,00,000 (Rupees Five Crore Eighty Lakh)
-- FY 2023-24: Annual Turnover INR 6,20,00,000 (Rupees Six Crore Twenty Lakh)
-- FY 2024-25: Annual Turnover INR 7,10,00,000 (Rupees Seven Crore Ten Lakh)
-- Average: INR 6,36,66,667 — exceeds INR 5 Crore threshold
-- Certified by: CA Ramesh Kumar, M. No. 123456
-### `project_experience.pdf`
-- 5 projects listed (2020–2025), each ≥ INR 1 Crore
-- Includes one CRPF project (2023): "Construction of barracks, CRPF Camp, Pune, INR 3.5 Crore"
-### `gst_certificate.pdf`
-- GSTIN: 27AABCA1234F1Z5
-- Valid through: 31-03-2027
-- Status: Active
-### `iso_9001.pdf`
-- Certificate No: ISO-2021-9001-APEX
-- Valid through: 15-06-2027
-- Issued by: Bureau Veritas
----
-## Bidder B — Clearly Ineligible (turnover too low)
-Same structure as Bidder A, but financials are below threshold.
-### `company_profile.pdf`
-- Company: "BuildRight Enterprises"
-- GSTIN: 29AABCB5678G1Z3
-### `audited_financials.pdf`
-- FY 2022-23: Annual Turnover INR 1,20,00,000 (Rupees One Crore Twenty Lakh)
-- FY 2023-24: Annual Turnover INR 1,50,00,000 (Rupees One Crore Fifty Lakh)
-- FY 2024-25: Annual Turnover INR 1,80,00,000 (Rupees One Crore Eighty Lakh)
-- Average: INR 1,50,00,000 — **below** INR 5 Crore threshold
-- Certified by: CA Suresh Patel, M. No. 654321
-### `project_experience.pdf`
-- 4 projects listed (2021–2025), each ≥ INR 1 Crore — passes C2
-### `gst_certificate.pdf`
-- GSTIN: 29AABCB5678G1Z3, valid through 2027, Active
-### `iso_9001.pdf`
-- Certificate No: ISO-2022-9001-BR
-- Valid through: 20-08-2027
----
-## Bidder C — Needs Review (scanned turnover certificate)
-No typed `audited_financials.pdf`. Instead: a deliberately noisy scan PNG.
-### `company_profile.pdf`
-- Company: "Shree Constructions & Services"
-- GSTIN: 24AABCC9012H1Z1
-### `project_experience.pdf`
-- Exactly 3 projects (borderline meets count threshold for C2)
-- Values: INR 1.2 Cr, INR 1.5 Cr, INR 2.1 Cr
-### `gst_certificate.pdf`
-- GSTIN: 24AABCC9012H1Z1, valid through 2027, Active
-### `iso_9001.pdf`
-- Certificate No: ISO-2023-9001-SCS
-- Valid through: 10-09-2027
-### `turnover_certificate_scan.png` — noisy scan generation
-This is the OCR demo centerpiece. Steps:
-1. Render a `reportlab` page to an in-memory PDF with a CA's turnover certificate:
-   - "This is to certify that M/s Shree Constructions & Services ... average annual turnover of INR 5,40,00,000 (Rupees Five Crore Forty Lakh only) for the financial years 2022-23, 2023-24, and 2024-25."
-   - Include year-wise breakdown table.
-2. Convert that PDF page to a PIL Image at 150 DPI using `fitz` (PyMuPDF).
-3. Apply degradation:
-   - `ImageFilter.GaussianBlur(radius=1.5)`
-   - Salt-and-pepper noise via numpy: randomly set ~5% of pixels to 0 or 255
-   - `image.rotate(-2, expand=True, fillcolor=(255,255,255))`
-   - Re-save with JPEG compression at quality=40 then reload as PNG
-4. Save as `data/bidders/bidder_c/turnover_certificate_scan.png`
-**Expected outcome:** Tesseract reads this at mean confidence ~50–65% → triggers Tier-3 vision LLM. The turnover figure (INR 5,40,00,000) is present but partially degraded, making it a realistic "needs human review" case given combined-confidence rules.
----
-## Script Design
-```python
-# scripts/generate_mock_data.py
-def make_tender_pdf(out_path: Path) -> None: ...
-def make_company_profile(out_path: Path, name: str, gstin: str, year: int) -> None: ...
-def make_financials(out_path: Path, rows: list[tuple[str, str, int]]) -> None: ...
-def make_project_experience(out_path: Path, projects: list[dict]) -> None: ...
-def make_gst_certificate(out_path: Path, gstin: str, valid_through: str) -> None: ...
-def make_iso_certificate(out_path: Path, cert_no: str, valid_through: str, company: str) -> None: ...
-def make_noisy_scan(out_path: Path) -> None: ...
-if __name__ == "__main__":
-    # Ensure output dirs exist
-    # Generate all files
-    print("Mock data generated successfully.")
-```
-Each helper creates one PDF/PNG. The script is idempotent (re-running overwrites files). No command-line arguments needed.
----
-## Acceptance Criteria
-1. Running `python scripts/generate_mock_data.py` exits 0 and prints "Mock data generated successfully."
-2. All 16 files listed above exist after the run.
-3. Each PDF opens in a viewer without errors and contains the text described.
-4. `turnover_certificate_scan.png` is visibly degraded (blurry, rotated, noisy).
-5. Running `pytesseract.image_to_data(Image.open("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a dataframe where the filtered mean confidence is between 30 and 70 (i.e., low enough to trigger Tier 3).
-6. Script completes in under 30 seconds on any modern machine.

specs/12_precompute.md DELETED Viewed

@@ -1,73 +0,0 @@
-# Spec 12 — Pre-compute Results
-**Step:** 11 of 15
-**Time budget:** ~15 min
-**Checkpoint:** Four JSON files exist in `data/precomputed/` and validate against the schemas.
----
-## Goal
-`scripts/precompute_results.py` runs the full pipeline once (requires a valid API key), saves the results as JSON fallback files, and commits them to the repo. When the API is unavailable during a demo, `fallback.py` reads these files instead.
----
-## Script: `scripts/precompute_results.py`
-```python
-"""Step 11 — runs the full pipeline and writes data/precomputed/*.json."""
-```
-### Steps
-1. Ensure `data/precomputed/` exists.
-2. Extract criteria from mock tender → save `data/precomputed/criteria.json`:
-   ```json
-   {"criteria": [<Criterion.model_dump()>, ...]}
-   ```
-3. For each bidder (`bidder_a`, `bidder_b`, `bidder_c`):
-   a. Process all bidder docs (`process_bidder`).
-   b. Evaluate all criteria (`evaluate_bidder`).
-   c. Save `data/precomputed/eval_{bidder_id}.json`:
-      ```json
-      [<Verdict.model_dump()>, ...]
-      ```
-4. Print summary and exit 0.
-### Error handling
-If the LLM fails for any criterion: catch `LLMUnavailable`, log a warning, skip that criterion (don't crash). At least the criteria file and partial evals are better than nothing.
-If no API key: print instructions and exit 1.
----
-## Fallback file format
-### `criteria.json`
-```json
-{
-  "criteria": [
-    {"id": "C1", "title": "...", ...},
-    ...
-  ]
-}
-```
-### `eval_bidder_a.json`
-```json
-[
-  {"verdict_id": "V-abc123", "bidder_id": "bidder_a", "criterion_id": "C1", "verdict": "eligible", ...},
-  ...
-]
-```
----
-## Acceptance Criteria
-1. Running `python scripts/precompute_results.py` exits 0 when API key is set.
-2. `data/precomputed/criteria.json` exists and contains `{"criteria": [...]}` with 5 items.
-3. Each `eval_bidder_*.json` contains a list of 5 `Verdict` dicts.
-4. `from core.fallback import load_criteria` returns 5 `Criterion` objects from the file.
-5. `from core.fallback import load_evaluation` returns the correct `Verdict` for bidder_a, C1.

specs/13_ui_tabs.md DELETED Viewed

@@ -1,121 +0,0 @@
-# Spec 13 — UI Tabs
-**Step:** 12 of 15
-**Time budget:** ~80 min total
----
-## Goal
-Implement all five Streamlit tabs and `ui/components.py`. The app must render the full demo flow without an API key (using precomputed data), and with one (calling the live LLM).
----
-## `ui/components.py` — Shared widgets
-### `verdict_pill(verdict: str) -> str`
-Returns a markdown-formatted colored badge string:
-- `eligible` → `":green[✅ Eligible]"`
-- `not_eligible` → `":red[❌ Not Eligible]"`
-- `needs_review` → `":orange[⚠ Needs Review]"`
-### `confidence_bar(value: float, label: str = "Confidence") -> None`
-Renders `st.progress(value, text=f"{label}: {value:.0%}")`.
-### `ocr_tier_badge(source_type: str) -> str`
-Returns a short badge string:
-- `text_pdf` → "`📄 text_pdf`"
-- `tesseract` → "`🔍 tesseract`"
-- `vision_llm` → "`👁 vision_llm`"
-### `category_badge(category: str) -> str`
-Returns `":blue[financial]"`, `":green[technical]"`, or `":orange[compliance]"`.
----
-## Tab 1 — Overview (`ui/tab_overview.py`)
-Layout:
-1. Hero text + tagline.
-2. Two-column KPI cards: Criteria Extracted, Bidders Evaluated, Mandatory Criteria Checked, Audit Entries Logged.
-3. Architecture summary (text description since no image file yet).
-4. "Use Pre-loaded Demo Data" CTA that sets `st.session_state["use_demo"] = True` and shows the criteria count from the fallback file.
-KPI values: count from `st.session_state` data and `audit.query()`.
----
-## Tab 2 — Tender Analysis (`ui/tab_tender.py`)
-Already implemented in Step 6. No changes needed beyond what's there.
----
-## Tab 3 — Bidder Evaluation (`ui/tab_bidders.py`)
-Layout:
-1. `st.header("Bidder Evaluation")`
-2. Multi-select for bidders: `["bidder_a", "bidder_b", "bidder_c"]`, default all.
-3. Button **"Run Evaluation"** (type=primary).
-4. On click:
-   a. Ensure criteria are loaded (from session_state or fallback).
-   b. For each selected bidder: `process_bidder(...)`, then `evaluate_bidder(...)`.
-   c. Store verdicts in `st.session_state["verdicts"]` as `{bidder_id: [Verdict.model_dump(), ...]}`.
-5. If verdicts in session:
-   - For each bidder: show per-bidder summary header.
-   - Show a table of criteria rows using `st.columns`.
-   - Each row: criterion title, verdict pill, extracted value, source chip (doc + page), OCR-tier badge, confidence bar.
-   - Expandable "Reason" and "Source Snippet" per row.
-Per-bidder summary: count eligible/not_eligible/needs_review among mandatory criteria. Overall: Eligible only if all mandatory are eligible; Not Eligible if any are not_eligible; Needs Review otherwise.
----
-## Tab 4 — Human Review Queue (`ui/tab_review.py`)
-Layout:
-1. `st.header("Human Review Queue")`
-2. Shows all verdicts where `review_status == "pending"` AND `verdict == "needs_review"`.
-3. For each such verdict:
-   - Show: bidder_id, criterion title, extracted value, confidence, reason, source snippet.
-   - Three buttons: **Approve**, **Edit & Approve**, **Reject**.
-   - **Approve**: set `review_status = "approved"`, log `human_review_action` to audit.
-   - **Edit & Approve**: show `st.text_input` for edited value, set `review_status = "edited"`, log audit.
-   - **Reject**: set `review_status = "rejected"`, log audit.
-4. If no pending items: `st.success("No items pending review.")`.
-State: verdicts stored in `st.session_state["verdicts"]` as nested dicts. Updates write back to the same structure.
----
-## Tab 5 — Audit Log (`ui/tab_audit.py`)
-Layout:
-1. `st.header("Audit Log")`
-2. Filter row: bidder dropdown, action dropdown, date range.
-3. Table: `st.dataframe` with columns: ts, action, actor, bidder_id, criterion_id, payload_json.
-4. **"Export CSV"** button: `st.download_button` with CSV data from filtered rows.
----
-## Sidebar update (`app.py`)
-Replace the hardcoded "🔴 **DeepSeek:** not connected" with a live probe:
-- Try `LLM().chat_json("ping", '{"ping": true}')` at startup (cached with session_state).
-- Green: live and no fallback fired.
-- Amber: fallback has fired this session.
-- Red: probe failed.
-If `st.session_state.get("fallback_active")`: show `st.sidebar.warning("⚠ Pre-computed mode active.")`.
----
-## Acceptance Criteria
-1. Tab 1 renders without error and shows KPI cards.
-2. Tab 3 "Run Evaluation" populates the verdict table for all 3 bidders.
-3. Bidder A shows all mandatory criteria eligible. Bidder B shows C1 not_eligible.
-4. Tab 4 shows at least one pending review item for Bidder C.
-5. Tab 4 Approve button updates `review_status` and adds an audit entry.
-6. Tab 5 shows audit entries and CSV download works.
-7. Sidebar connection dot is green/amber/red based on API availability.

submission_requirements.md DELETED Viewed

@@ -1,29 +0,0 @@
-# Prototype Phase — Submission Requirements
-> This is the submission form for Round 2 (Prototype Phase). The idea was already shortlisted in Round 1.
----
-## Required Fields
-| Field | Notes |
-|---|---|
-| **Title** | Clear, descriptive title |
-| **Description** | Project description with formatting and links allowed |
-| **Parent Submission** | Link to the shortlisted Round 1 idea submission |
-| **Theme** | Theme 3: AI-Based Tender Evaluation and Eligibility Analysis |
-| **Snapshots** | Images of the project (JPG/JPEG/PNG, up to 3MB each) |
-| **Video URL** | Demo or pitch video link |
-| **Presentation** | Pitch deck or slides (.key, .odp, .odt, .pdf, .pps, .ppt, .pptx — max 50MB) |
-| **Demo Link** | Link to working demo or prototype |
-| **Repository URL** | GitHub, Bitbucket, or similar code repository |
-| **Source Code** | Zip or APK upload (max 50MB) |
-| **Instructions to Run** | Step-by-step setup and run instructions for reviewers |
-| **Custom Attachment** | Any additional file — PDF, images, spreadsheets (max 50MB) |
----
-## Notes
-- The "Parent Submission" field links this prototype to the previously shortlisted idea.
-- "Which shortlisted idea are you submitting this prototype for?" — confirms the link to the Round 1 submission.

theme.md DELETED Viewed

@@ -1,89 +0,0 @@
-# Theme 3: AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement by CRPF
-## Context
-Government organisations such as the Central Reserve Police Force (CRPF) issue tenders to procure goods and services. Each tender specifies detailed requirements: technical specifications, financial thresholds, compliance rules, eligibility conditions, document checklists and mandatory certifications. These requirements are typically written in formal, legally careful language and are spread across many pages of the tender document.
-Private companies respond with bids, each submitting their own set of supporting documents — company profiles, financial statements, experience letters, tax registrations, certifications and more. The documents arrive in many formats: structured text PDFs, scanned copies, Word files, tables and even photographs of physical certificates. The same kind of information is presented in many different ways across bidders.
-Evaluating whether each bidder meets the stated eligibility criteria is currently a manual process. It is slow, inconsistent across evaluators, prone to oversight, and hard to audit. For a single tender, a committee may spend days cross-checking hundreds of pages against a list of criteria, and two evaluators may reach different conclusions from the same set of documents. There is a clear opportunity to bring modern AI techniques to this problem — to extract structured information from unstructured tender and bid documents, apply the eligibility rules consistently, and produce explainable evaluation reports that a human officer can trust and sign off on.
----
-## The Problem
-Design a technical platform that, given a tender document and a set of bidder submissions, can do the following:
-### Understand the Tender
-- Extract the eligibility criteria from the tender document — technical specifications, financial thresholds, compliance conditions, and document and certification requirements.
-- Distinguish between mandatory and optional criteria.
-- Capture each criterion in a form that can be matched against a bidder's submission.
-### Understand Each Bidder
-- Parse every bidder submission, regardless of whether the documents are typed PDFs, scanned copies, Word files or photographs.
-- Extract the values and evidence relevant to each criterion from those documents.
-- Handle variation in how bidders present the same information.
-### Evaluate and Explain
-- For each bidder, decide whether they are **Eligible**, **Not Eligible**, or **Need Manual Review** against each criterion and overall.
-- Produce an explanation for every verdict that references the specific criterion, the specific document and the specific value that drove the decision.
-- Surface ambiguous or uncertain cases for human review rather than silently disqualifying them.
-- Produce a consolidated evaluation report that a procurement officer can use as the basis for a decision.
----
-## Non-Negotiables
-- Every verdict must be explainable at the criterion level — which criterion was being checked, which document was used, what value was found, and why the bidder passed, failed or needs review.
-- The system must **never silently disqualify** a bidder. Ambiguous or uncertain cases must be surfaced for human review with the reason.
-- The system must handle scanned documents and photographs, not only digital text.
-- The system must be auditable end-to-end and suitable for use in a formal government procurement decision.
-- Real tender and bid data will not be released for Round 1. Any Round 2 implementation will run on representative mock or redacted documents inside a sandbox.
----
-## What Success Looks Like
-A working solution should eventually make the following behaviours possible:
-1. A procurement officer uploads a tender document and a set of bidder submissions. The system extracts the eligibility criteria automatically and lists them for review.
-2. For each bidder, the system produces a criterion-by-criterion evaluation with references back to the source documents.
-3. Clearly eligible and clearly ineligible bidders are marked as such; genuinely ambiguous cases are flagged for manual review with the reason for the ambiguity.
-4. A consolidated report can be exported and signed off, with a complete audit trail of every automated decision.
----
-## Sample Scenario
-A government department issues a tender for construction services with the following eligibility criteria: a minimum annual turnover of ₹5 crore, at least 3 similar projects completed in the last 5 years, a valid GST registration, and an ISO 9001 certification. Ten bidders submit responses, each with their own combination of typed and scanned supporting documents.
-A good solution would extract these four criteria from the tender, parse each bidder's submission, and produce a report:
-- 6 bidders clearly eligible with evidence for each criterion
-- 3 clearly ineligible with the specific criterion they failed and the document that showed it
-- 1 flagged for manual review because the turnover document is a scanned certificate with figures that could not be read with confidence
----
-## What Your Solution Should Cover
-Round 1 of this hackathon is a **written solution submission**. Your solution document should make clear how you would build this platform. At minimum, it should cover:
-1. Your understanding of the problem and the realities of government procurement, in your own words.
-2. Your approach to extracting eligibility criteria from a tender document, including how you separate technical, financial and compliance conditions, and how you distinguish mandatory from optional criteria.
-3. Your approach to parsing bidder submissions with heterogeneous document types — typed PDFs, scanned documents, tables, photographs — and extracting the values that map to each criterion.
-4. How you match extracted bidder information against the criteria, and how you handle ambiguity, partial information and variation in legal and technical language.
-5. How the system produces explainable, criterion-level verdicts, and how ambiguous cases are surfaced for human review instead of being silently rejected.
-6. How you would guarantee the auditability of every decision, suitable for a formal government procurement context.
-7. A clear architecture overview, the key technology and model choices you would make, and the reasons behind them.
-8. The main risks and trade-offs you see, and how you would handle them.
-9. A rough implementation plan for Round 2, assuming a sandbox with sample tender and bidder documents is provided.
----
-## How We Will Evaluate Proposals
-- Clarity of problem understanding — does the team show they have grasped the realities of government procurement, not just the surface problem?
-- Technical soundness of the proposed approach, including document understanding, criterion matching and explainability.
-- Depth of thinking on edge cases: scanned documents, photographs, ambiguous language, partial information and format inconsistency.
-- Design of the human-in-the-loop path for ambiguous cases, and of the audit trail.
-- Quality of the architecture, the justification of technology and model choices, and the identified risks and trade-offs.

understanding.md DELETED Viewed

@@ -1,154 +0,0 @@
-# TenderIQ — Project Understanding
----
-## Where We Are
-The idea phase (Round 1) is **done and shortlisted**. The `idea.md` was the written submission. We are now in the **Prototype Phase (Round 2)**, which requires a working prototype, demo, code repository, pitch deck, and video.
----
-## The Problem (from CRPF's perspective)
-CRPF issues tenders. Companies bid. Someone has to manually read:
-- The tender document (criteria, thresholds, compliance rules)
-- Every bidder's stack of supporting documents (PDFs, scans, photos, Word files)
-...and verify that each bidder meets each criterion. For one tender, this takes a committee days. Two evaluators may reach different conclusions from the same documents. There's no consistent audit trail.
-**The core pain points:**
-1. Manual, slow, expensive
-2. Inconsistent across evaluators
-3. Not auditable / not transparent
-4. Documents arrive in messy formats (scanned, photographed, mixed)
----
-## What TenderIQ Does
-A four-stage AI pipeline:
-```
-Tender Document ──► [Stage 1] Criteria Extraction
-                              │
-                              ▼
-Bidder Documents ──► [Stage 2] Document Processing (OCR + entity extraction)
-                              │
-                              ▼
-                    [Stage 3] Evaluation Engine (rule-based + confidence)
-                              │
-                              ▼
-                    [Stage 4] Explainability + Audit Layer
-                              │
-                    ┌─────────┴──────────┐
-                    ▼                    ▼
-               Auto-decision       Human Review Queue
-           (Eligible / Not Eligible)  (Needs Manual Review)
-```
-### Stage 1 — Tender Understanding
-- LLM + rule-based hybrid extracts criteria from tender doc
-- Classifies each as mandatory or optional
-- Outputs structured, machine-readable criteria list
-### Stage 2 — Bidder Document Processing
-- Handles: typed PDFs, scanned docs, images, Word files
-- OCR for non-digital content
-- Layout-aware parsing (tables, forms, certificates)
-- Entity extraction: turnover figures, cert names, project counts
-- Every extracted value tagged with: source doc, page number, confidence score
-### Stage 3 — Evaluation Engine
-- Criterion-by-criterion comparison per bidder
-- Rule-based validation (threshold checks)
-- Confidence-aware: low confidence → "Needs Manual Review", not auto-reject
-- Three outcomes: Eligible / Not Eligible / Needs Manual Review
-### Stage 4 — Explainability + Audit
-- Every decision has: criterion checked, value found, source doc, confidence, reason
-- Full audit log: model version, timestamp, reviewer actions
-- Human reviewers can approve / edit / reject flagged cases
-- Reviewer decisions feed back into system improvement
----
-## Non-Negotiables (from theme)
-These are hard constraints, not nice-to-haves:
-| Constraint | Implication for build |
-|---|---|
-| Every verdict must be explainable at criterion level | No black-box scoring; each criterion decision must be traceable |
-| Never silently disqualify | Low confidence = human review queue, not auto-reject |
-| Must handle scanned docs and photographs | OCR is not optional |
-| End-to-end auditable | Every system action must be logged with immutable records |
----
-## What We Need to Deliver (Prototype Phase)
-| Deliverable | What it means |
-|---|---|
-| Working demo | The pipeline must actually run on mock/sample data |
-| Demo link | Hosted or accessible prototype |
-| Repo URL | Clean, documented code |
-| Source code zip | Packaged for reviewers to run |
-| Run instructions | Step-by-step so reviewers can test it |
-| Presentation | Pitch deck covering the full solution |
-| Video | Demo + pitch walkthrough |
-| Snapshots | Screenshots of the UI/output |
-| Description | Written summary of the project |
----
-## Proposed Tech Stack (from idea)
-| Component | Technology | Why |
-|---|---|---|
-| LLM for criteria extraction | LLM (e.g., Claude, GPT-4, or open-source) | Handles legal language, ambiguity |
-| OCR | Tesseract or PaddleOCR | Open-source, handles scanned docs and images |
-| Document layout understanding | LayoutLM | Understands tables, forms, structured layouts |
-| Backend | Python + FastAPI | Fast to build, good ML ecosystem |
-| Database | PostgreSQL + vector DB | Structured storage + semantic search |
-| Frontend | React | Dashboard for review, reporting |
----
-## Key Design Decisions to Think About
-### 1. Hybrid extraction (LLM + rules)
-- Pure LLM: flexible but unpredictable on numeric thresholds
-- Pure rules: precise but brittle on varied language
-- Hybrid: LLM for interpretation, rules for validation — best of both
-### 2. Confidence threshold design
-- What confidence score triggers "Needs Manual Review"?
-- This is a calibration problem — too low a threshold floods reviewers, too high risks bad auto-decisions
-### 3. Vector DB role
-- Enables semantic search over extracted bidder data
-- Useful when a criterion mentions "similar projects" and you need to match against descriptions
-### 4. Audit log immutability
-- Government procurement context requires tamper-evident logs
-- Must capture: what AI decided, why, when, which model version, and what the human reviewer did
----
-## Gaps / Things Not Yet Defined
-- **Which LLM?** The idea says "LLMs" but doesn't specify. For a prototype, this matters.
-- **Which vector DB?** Pinecone, Weaviate, ChromaDB, pgvector — not chosen yet.
-- **Criteria schema** — what does the structured criterion object look like exactly?
-- **Confidence score methodology** — how is it calculated and what thresholds are used?
-- **UI scope** — how much of the review interface needs to be built for the prototype?
-- **Mock data** — we need sample tender docs and bidder submissions to demo against.
-- **Evaluation report format** — what does the exported report look like?
----
-## Summary
-The idea is solid and already shortlisted. The core insight is: **don't try to fully automate procurement decisions; build a system that makes human reviewers dramatically faster and more consistent, with a complete audit trail.** The prototype needs to demonstrate this pipeline end-to-end on mock data, with a UI that shows criterion-level explanations.
-Next step: define the implementation plan — what to build, in what order, and what scope is realistic for the prototype.