JaydeepR Claude Sonnet 4.6 commited on
Commit
bdb7765
·
1 Parent(s): ad20db7

README rewrite + cleanup

Browse files

README: complete rewrite with problem statement, feature table, quick start
(with and without API key), demo scenario table, architecture diagram,
full project structure, tech stack table, and future work section.

Cleanup:
- Remove chromadb==0.5.5 from requirements.txt (replaced by pure-Python vectorstore)
- Remove .gitkeep from data/bidders/* and data/tender/ (directories now populated)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

README.md CHANGED
@@ -13,99 +13,202 @@ short_description: Explainable AI for Government Tender Evaluation (CRPF Hackath
13
 
14
  # TenderIQ — Explainable AI for Tender Evaluation
15
 
16
- AI-powered eligibility evaluation of bidders against government tender criteria, built for the **CRPF Hackathon, Theme 3**.
17
 
18
- ## What it does
19
 
20
- 1. **Extract criteria** — DeepSeek LLM reads the tender PDF and extracts each eligibility criterion as structured JSON (category, rule, query hints, source clause).
21
- 2. **OCR & index bidder documents** — Three-tier OCR pipeline: PyMuPDF (typed PDF) → Tesseract → DeepSeek Vision LLM (for low-confidence scans). All pages indexed into ChromaDB.
22
- 3. **Evaluate per criterion** — Vector search retrieves relevant evidence; DeepSeek decides eligible / not_eligible / needs_review with combined confidence scoring.
23
- 4. **Human review & audit** — Low-confidence verdicts are routed to a review queue. Every action is logged with timestamp, model version, actor, and payload.
 
 
 
 
 
 
 
24
 
25
- ## Quick Start (local)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ```bash
28
- # 1. Clone the repo
29
  git clone <repo-url> && cd TenderIQ
30
-
31
- # 2. Install dependencies
32
  pip install -r requirements.txt
33
- # On Linux/Mac also: apt install tesseract-ocr poppler-utils
 
34
 
35
- # 3. Set your API key (optional works without key using pre-computed data)
36
- cp .env.example .env
37
- # Edit .env: DEEPSEEK_API_KEY=your_key_here
38
 
39
- # 4. Generate mock data (already committed — only needed if you delete data/)
40
- python scripts/generate_mock_data.py
41
 
42
- # 5. Run the app
 
 
43
  streamlit run app.py
44
  ```
45
 
46
- Open http://localhost:8501 in your browser.
 
 
 
 
 
47
 
48
- ## Running without an API key (pre-computed mode)
49
 
50
- The app works without a DeepSeek API key. Pre-computed results in `data/precomputed/` are used as fallback automatically. The sidebar shows an amber dot and a banner when in this mode.
 
 
51
 
52
- The demo flow:
53
- 1. Go to **Overview** tab → click **Load Pre-computed Demo** to instantly populate all tabs with realistic results.
54
- 2. Navigate to **Bidder Evaluation** to see the verdict table with confidence bars and OCR-tier badges.
55
- 3. **Human Review** tab shows Bidder C's turnover criterion flagged for review (low-confidence scan).
56
- 4. **Audit Log** tab shows the full activity log with CSV export.
57
 
58
- ## Running with a live API key
59
 
60
- Set `DEEPSEEK_API_KEY` in `.env` (or Streamlit Cloud secrets). The sidebar shows a green dot. Then:
61
- 1. **Tender Analysis** → click **Extract Criteria (Live LLM)** — extracts 5 criteria from the mock tender.
62
- 2. **Bidder Evaluation** → click **Run Evaluation** — processes all 3 bidders.
63
 
64
- ## Running the smoke test
65
 
66
- ```bash
67
- python scripts/smoke_test.py
68
- ```
69
 
70
- Exits 0 on success (43 checks, ~10 seconds).
 
 
 
 
71
 
72
- ## Pre-computing results
 
 
 
 
73
 
74
- If you have an API key and want to regenerate the fallback JSON:
75
- ```bash
76
- python scripts/precompute_results.py
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ```
78
 
79
- ## Project structure
 
 
 
 
80
 
81
  ```
82
  TenderIQ/
83
- ├── app.py # Streamlit entry point
84
  ├── core/
85
- │ ├── config.py # Constants and paths
86
- │ ├── schemas.py # Pydantic models
87
- │ ├── prompts.py # LLM prompt strings
88
- │ ├── llm_client.py # DeepSeek wrapper
89
- │ ├── pdf_utils.py # PyMuPDF extraction
90
- │ ├── ocr_pipeline.py # 3-tier OCR
91
- │ ├── chunker.py # Text chunking
92
- │ ├── vectorstore.py # ChromaDB helpers
93
- │ ├── criteria_extractor.py # Stage 1: tender → criteria
94
- │ ├── bidder_processor.py # Stage 2: bidder docs → chunks
95
- │ ├── evaluator.py # Stage 3: verdict generation
96
- │ ├── audit.py # SQLite audit log
97
- │ └── fallback.py # Pre-computed fallback
98
- ├── ui/ # Streamlit tab modules
 
 
 
 
 
 
 
 
99
  ├── data/
100
- │ ├── tender/ # Mock tender PDF
101
- │ ├── bidders/ # Mock bidder documents
102
- │ └── precomputed/ # Fallback JSON files
103
- ├── scripts/ # generate_mock_data, precompute, smoke_test
104
- ── specs/ # Per-module specs (spec-driven development)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ```
106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ## Notes
108
 
109
- - **PyMuPDF (AGPL)** — allowed for hackathon use; see LICENSE for details.
110
- - **Tesseract** — must be installed separately on Windows. Available via `packages.txt` on Streamlit Cloud.
111
- - **First cloud load** — ChromaDB downloads the all-MiniLM-L6-v2 model (~80 MB) on first run. Pre-warm by visiting the deployed URL once before the demo.
 
13
 
14
  # TenderIQ — Explainable AI for Tender Evaluation
15
 
16
+ > Built for the **CRPF Hackathon, Theme 3 — AI-Based Tender Evaluation and Eligibility Analysis for Government Procurement**
17
 
18
+ TenderIQ automates the eligibility evaluation of bidders against government tender criteria. It extracts structured criteria from any tender PDF, processes bidder documents through a three-tier OCR pipeline, evaluates each (bidder × criterion) pair using a language model with combined confidence scoring, and surfaces ambiguous cases for human review — all with a complete, exportable audit trail.
19
 
20
+ ---
21
+
22
+ ## The Problem
23
+
24
+ A procurement committee today manually reads hundreds of pages of tender documents and bidder submissions, cross-checks financial statements, certificates, and project records, and decides whether each bidder meets each criterion. For one tender this can take 3–5 days. Two evaluators regularly reach different conclusions on the same documents. There is no consistent audit trail.
25
+
26
+ TenderIQ reduces this to minutes, with every decision traceable to a specific document, page, and model version.
27
+
28
+ ---
29
+
30
+ ## Key Features
31
 
32
+ | Feature | Detail |
33
+ |---|---|
34
+ | **Criterion extraction** | DeepSeek LLM reads the tender PDF and returns each criterion as structured JSON — category, mandatory flag, threshold rule, source clause, and retrieval query hints |
35
+ | **Three-tier OCR** | PyMuPDF for typed PDFs → Tesseract for scans → DeepSeek Vision LLM when Tesseract confidence < 65%. Every page records which tier read it |
36
+ | **Semantic evidence retrieval** | sentence-transformers all-MiniLM-L6-v2 indexes all bidder document chunks; top-k retrieval feeds the evaluator |
37
+ | **Safety threshold rule** | A borderline `not_eligible` verdict at medium confidence (0.55–0.80) is automatically downgraded to `needs_review` — never silent disqualification |
38
+ | **Human review queue** | Flagged verdicts surface with full evidence, source snippet, and OCR tier badge. Officers Approve / Edit & Approve / Reject with one click |
39
+ | **Interpretability tab** | Plain-English per-criterion breakdown with inline PDF page previews and an LLM-powered Q&A ("Why was this bidder rejected?") |
40
+ | **Audit log** | Every action — extraction, OCR invocation, evaluation, human review — logged to SQLite with timestamp, model version, actor, and payload. CSV export |
41
+ | **Pre-computed fallback** | If the API is unavailable, pre-computed JSON is served transparently. Sidebar shows an amber dot. Demo always works |
42
+
43
+ ---
44
+
45
+ ## Quick Start
46
+
47
+ ### Without an API key (pre-computed demo)
48
 
49
  ```bash
 
50
  git clone <repo-url> && cd TenderIQ
 
 
51
  pip install -r requirements.txt
52
+ streamlit run app.py
53
+ ```
54
 
55
+ Open http://localhost:8501, go to the **Overview** tab, and click **Load Pre-computed Demo**. All tabs populate instantly with realistic results for three mock bidders.
 
 
56
 
57
+ ### With a live API key
 
58
 
59
+ ```bash
60
+ cp .env.example .env
61
+ # Edit .env and set: DEEPSEEK_API_KEY=your_key_here
62
  streamlit run app.py
63
  ```
64
 
65
+ The sidebar turns 🟢. Then:
66
+ 1. **Tender Analysis** → select a tender (mock or real CRPF tender) → **Extract Criteria**
67
+ 2. Remove any criteria you don't want evaluated using the × button
68
+ 3. **Bidder Evaluation** → **Run Evaluation**
69
+ 4. Review flagged verdicts in **Human Review**
70
+ 5. Export the full activity log from **Audit Log**
71
 
72
+ ### Tesseract (for OCR on scanned documents)
73
 
74
+ ```bash
75
+ # Linux / Streamlit Cloud — handled automatically via packages.txt
76
+ apt install tesseract-ocr poppler-utils
77
 
78
+ # Windows
79
+ # Download installer from https://github.com/UB-Mannheim/tesseract/wiki
80
+ # Add to PATH, then restart the app
81
+ ```
 
82
 
83
+ Tesseract is optional. If unavailable, the OCR pipeline falls through to the DeepSeek Vision LLM tier.
84
 
85
+ ---
 
 
86
 
87
+ ## Demo Scenarios
88
 
89
+ Three mock bidders are included, each designed to exercise a different path through the pipeline:
 
 
90
 
91
+ | Bidder | Company | Outcome | Why |
92
+ |---|---|---|---|
93
+ | **Bidder A** | Apex Constructions Pvt. Ltd. | ✅ Eligible | All 4 mandatory criteria met, typed PDFs, high confidence |
94
+ | **Bidder B** | BuildRight Enterprises | ❌ Not Eligible | C1 fails — average turnover INR 1.5 Cr vs 5 Cr threshold |
95
+ | **Bidder C** | Shree Constructions & Services | ⚠️ Needs Review | Turnover certificate submitted as a blurry, rotated scan — Tesseract confidence ~55% triggers Vision LLM; combined confidence routes to human review |
96
 
97
+ Two real CRPF tenders from crpf.gov.in are also included in `data/tender/real_tenders/` for live testing.
98
+
99
+ ---
100
+
101
+ ## Architecture
102
 
103
+ ```
104
+ Tender PDF ──► DeepSeek LLM ──► Criteria (JSON)
105
+
106
+ Bidder Docs ──► 3-tier OCR ──► Chunks ──► Vector Index
107
+ │ │
108
+ │ semantic search
109
+ │ │
110
+ └──────────────► DeepSeek LLM (evaluate)
111
+
112
+ combined confidence
113
+
114
+ ┌─────────────┴─────────────┐
115
+ eligible / needs_review
116
+ not_eligible ──► Human Review Queue
117
+ │ │
118
+ └──────────── Audit Log ───────┘
119
  ```
120
 
121
+ Full module documentation: [`ARCHITECTURE.md`](ARCHITECTURE.md)
122
+
123
+ ---
124
+
125
+ ## Project Structure
126
 
127
  ```
128
  TenderIQ/
129
+ ├── app.py # Streamlit entry point, 6 tabs, sidebar
130
  ├── core/
131
+ │ ├── config.py # Constants, paths, thresholds
132
+ │ ├── schemas.py # Pydantic: Criterion, Evidence, Verdict, AuditEntry
133
+ │ ├── prompts.py # LLM prompt strings
134
+ │ ├── llm_client.py # DeepSeek wrapper — chat_json, chat_vision, LLMUnavailable
135
+ │ ├── pdf_utils.py # PyMuPDF: extract_pages, render_page_to_image
136
+ │ ├── ocr_pipeline.py # 3-tier OCR orchestrator with MD5 cache
137
+ │ ├── chunker.py # Tender and bidder document chunking
138
+ │ ├── vectorstore.py # In-memory vector store (sentence-transformers + BM25 fallback)
139
+ │ ├── criteria_extractor.py # Stage 1: tender PDF List[Criterion]
140
+ │ ├── bidder_processor.py # Stage 2: bidder docs → indexed chunks
141
+ │ ├── evaluator.py # Stage 3: per-criterion verdict + confidence
142
+ │ ├── audit.py # SQLite audit log
143
+ │ └── fallback.py # Pre-computed JSON fallback
144
+ ├── ui/
145
+ │ ├── tab_overview.py # Hero, KPIs, pipeline diagram, demo CTA
146
+ │ ├── tab_tender.py # Upload tender, extract + discard criteria
147
+ │ ├── tab_bidders.py # Evaluation table with bidder toggles
148
+ │ ├── tab_review.py # Human review queue — Approve / Edit / Reject
149
+ │ ├── tab_audit.py # Audit log table + CSV export
150
+ │ ├── tab_interpretability.py # Plain-English breakdown + LLM Q&A
151
+ │ ├── components.py # Shared HTML badge/pill/bar components
152
+ │ └── styles.py # Global CSS injection
153
  ├── data/
154
+ │ ├── tender/ # Mock tender PDF + real CRPF tenders
155
+ │ ├── bidders/ # Mock bidder documents (A, B, C)
156
+ │ └── precomputed/ # Fallback JSON (criteria + verdicts)
157
+ ├── scripts/
158
+ │ ├── generate_mock_data.py # Generates all mock PDFs and noisy scan
159
+ │ ├── precompute_results.py # Runs full pipeline, saves fallback JSON
160
+ │ ├── generate_deck.py # Generates pitch deck PDF
161
+ │ └── smoke_test.py # 43-check end-to-end test, exits 0
162
+ ├── specs/ # Per-module spec documents
163
+ ├── deck/ # Pitch deck PDF
164
+ └── ARCHITECTURE.md # Full architecture reference
165
+ ```
166
+
167
+ ---
168
+
169
+ ## Running the Smoke Test
170
+
171
+ ```bash
172
+ python scripts/smoke_test.py
173
  ```
174
 
175
+ 43 checks covering imports, config, schemas, PDF utils, OCR pipeline, fallback, audit, evaluator threshold logic, and precomputed files. Exits 0 on success (~10 seconds, no API key needed).
176
+
177
+ ---
178
+
179
+ ## Tech Stack
180
+
181
+ | Component | Technology |
182
+ |---|---|
183
+ | UI & orchestration | Streamlit 1.39 |
184
+ | LLM (criteria extraction + evaluation) | DeepSeek API via OpenAI SDK |
185
+ | OCR Tier 1 | PyMuPDF 1.24 |
186
+ | OCR Tier 2 | Tesseract (pytesseract) |
187
+ | OCR Tier 3 | DeepSeek Vision LLM |
188
+ | Semantic retrieval | sentence-transformers 2.7 (all-MiniLM-L6-v2) |
189
+ | Data validation | Pydantic v2 |
190
+ | Audit log | SQLite |
191
+ | Mock data generation | ReportLab + Pillow + NumPy |
192
+
193
+ ---
194
+
195
+ ## Future Work
196
+
197
+ The current build is scoped to the hackathon prototype. Directions for production:
198
+
199
+ - **Multi-tender workspace** — evaluate the same pool of bidders against multiple tenders simultaneously, with a unified eligibility matrix
200
+ - **GeM portal integration** — pull live tenders directly from the Government e-Marketplace API instead of uploading PDFs
201
+ - **Automated bidder ranking** — weighted scoring across criteria to produce an ordered shortlist, not just pass/fail
202
+ - **LayoutLM for complex tables** — better structured extraction from financial statements with multi-column layouts and merged cells
203
+ - **Multi-evaluator workflow** — role-based access (Evaluator / Reviewer / Approver) with parallel review and conflict resolution
204
+ - **Review queue notifications** — email or SMS alerts when verdicts are flagged, so officers don't need to poll the app
205
+ - **Bulk bidder upload** — ZIP/folder upload for large tenders with many bidders, with background processing and progress tracking
206
+ - **Audit export for compliance** — structured PDF report per tender, formatted for submission to procurement oversight bodies
207
+
208
+ ---
209
+
210
  ## Notes
211
 
212
+ - **PyMuPDF (AGPL)** — used under AGPL; acceptable for hackathon use.
213
+ - **No auth / multi-user** — single hardcoded `officer` identity in audit entries. Out of scope for this prototype.
214
+ - **First run** — sentence-transformers downloads all-MiniLM-L6-v2 (~90 MB) on first evaluation. Subsequent runs use the cache.
data/bidders/bidder_a/.gitkeep DELETED
File without changes
data/bidders/bidder_b/.gitkeep DELETED
File without changes
data/bidders/bidder_c/.gitkeep DELETED
File without changes
data/tender/.gitkeep DELETED
File without changes
requirements.txt CHANGED
@@ -4,7 +4,6 @@ pymupdf==1.24.10
4
  pytesseract==0.3.13
5
  Pillow==10.4.0
6
  numpy==1.26.4
7
- chromadb==0.5.5
8
  sentence-transformers==2.7.0
9
  pydantic==2.9.2
10
  python-dotenv==1.0.1
 
4
  pytesseract==0.3.13
5
  Pillow==10.4.0
6
  numpy==1.26.4
 
7
  sentence-transformers==2.7.0
8
  pydantic==2.9.2
9
  python-dotenv==1.0.1