JaydeepR Claude Sonnet 4.6 commited on
Commit
76e0cee
·
1 Parent(s): b2ad034

Step 13: smoke test (43 checks) and README

Browse files

Implements the end-to-end smoke test covering: imports, config, schemas, mock
data, pdf_utils, chunker, OCR pipeline, fallback, audit, evaluator threshold
logic, and precomputed files. All 43 checks pass. README covers local quickstart,
pre-computed mode, live API mode, and project structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +98 -0
  2. scripts/smoke_test.py +141 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TenderIQ — Explainable AI for Tender Evaluation
2
+
3
+ AI-powered eligibility evaluation of bidders against government tender criteria, built for the **CRPF Hackathon, Theme 3**.
4
+
5
+ ## What it does
6
+
7
+ 1. **Extract criteria** — DeepSeek LLM reads the tender PDF and extracts each eligibility criterion as structured JSON (category, rule, query hints, source clause).
8
+ 2. **OCR & index bidder documents** — Three-tier OCR pipeline: PyMuPDF (typed PDF) → Tesseract → DeepSeek Vision LLM (for low-confidence scans). All pages indexed into ChromaDB.
9
+ 3. **Evaluate per criterion** — Vector search retrieves relevant evidence; DeepSeek decides eligible / not_eligible / needs_review with combined confidence scoring.
10
+ 4. **Human review & audit** — Low-confidence verdicts are routed to a review queue. Every action is logged with timestamp, model version, actor, and payload.
11
+
12
+ ## Quick Start (local)
13
+
14
+ ```bash
15
+ # 1. Clone the repo
16
+ git clone <repo-url> && cd TenderIQ
17
+
18
+ # 2. Install dependencies
19
+ pip install -r requirements.txt
20
+ # On Linux/Mac also: apt install tesseract-ocr poppler-utils
21
+
22
+ # 3. Set your API key (optional — works without key using pre-computed data)
23
+ cp .env.example .env
24
+ # Edit .env: DEEPSEEK_API_KEY=your_key_here
25
+
26
+ # 4. Generate mock data (already committed — only needed if you delete data/)
27
+ python scripts/generate_mock_data.py
28
+
29
+ # 5. Run the app
30
+ streamlit run app.py
31
+ ```
32
+
33
+ Open http://localhost:8501 in your browser.
34
+
35
+ ## Running without an API key (pre-computed mode)
36
+
37
+ The app works without a DeepSeek API key. Pre-computed results in `data/precomputed/` are used as fallback automatically. The sidebar shows an amber dot and a banner when in this mode.
38
+
39
+ The demo flow:
40
+ 1. Go to **Overview** tab → click **Load Pre-computed Demo** to instantly populate all tabs with realistic results.
41
+ 2. Navigate to **Bidder Evaluation** to see the verdict table with confidence bars and OCR-tier badges.
42
+ 3. **Human Review** tab shows Bidder C's turnover criterion flagged for review (low-confidence scan).
43
+ 4. **Audit Log** tab shows the full activity log with CSV export.
44
+
45
+ ## Running with a live API key
46
+
47
+ Set `DEEPSEEK_API_KEY` in `.env` (or Streamlit Cloud secrets). The sidebar shows a green dot. Then:
48
+ 1. **Tender Analysis** → click **Extract Criteria (Live LLM)** — extracts 5 criteria from the mock tender.
49
+ 2. **Bidder Evaluation** → click **Run Evaluation** — processes all 3 bidders.
50
+
51
+ ## Running the smoke test
52
+
53
+ ```bash
54
+ python scripts/smoke_test.py
55
+ ```
56
+
57
+ Exits 0 on success (43 checks, ~10 seconds).
58
+
59
+ ## Pre-computing results
60
+
61
+ If you have an API key and want to regenerate the fallback JSON:
62
+ ```bash
63
+ python scripts/precompute_results.py
64
+ ```
65
+
66
+ ## Project structure
67
+
68
+ ```
69
+ TenderIQ/
70
+ ├── app.py # Streamlit entry point
71
+ ├── core/
72
+ │ ├── config.py # Constants and paths
73
+ │ ├── schemas.py # Pydantic models
74
+ │ ├── prompts.py # LLM prompt strings
75
+ │ ├── llm_client.py # DeepSeek wrapper
76
+ │ ├── pdf_utils.py # PyMuPDF extraction
77
+ │ ├── ocr_pipeline.py # 3-tier OCR
78
+ │ ├── chunker.py # Text chunking
79
+ │ ├── vectorstore.py # ChromaDB helpers
80
+ │ ├── criteria_extractor.py # Stage 1: tender → criteria
81
+ │ ├── bidder_processor.py # Stage 2: bidder docs → chunks
82
+ │ ├── evaluator.py # Stage 3: verdict generation
83
+ │ ├── audit.py # SQLite audit log
84
+ │ └── fallback.py # Pre-computed fallback
85
+ ├── ui/ # Streamlit tab modules
86
+ ├── data/
87
+ │ ├── tender/ # Mock tender PDF
88
+ │ ├── bidders/ # Mock bidder documents
89
+ │ └── precomputed/ # Fallback JSON files
90
+ ├── scripts/ # generate_mock_data, precompute, smoke_test
91
+ └── specs/ # Per-module specs (spec-driven development)
92
+ ```
93
+
94
+ ## Notes
95
+
96
+ - **PyMuPDF (AGPL)** — allowed for hackathon use; see LICENSE for details.
97
+ - **Tesseract** — must be installed separately on Windows. Available via `packages.txt` on Streamlit Cloud.
98
+ - **First cloud load** — ChromaDB downloads the all-MiniLM-L6-v2 model (~80 MB) on first run. Pre-warm by visiting the deployed URL once before the demo.
scripts/smoke_test.py CHANGED
@@ -1 +1,142 @@
1
  """Step 13 — programmatic end-to-end check; exits 0 on success."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """Step 13 — programmatic end-to-end check; exits 0 on success."""
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ BASE_DIR = Path(__file__).resolve().parent.parent
7
+ sys.path.insert(0, str(BASE_DIR))
8
+
9
+
10
+ def check(condition: bool, msg: str) -> None:
11
+ if not condition:
12
+ print(f"FAIL: {msg}")
13
+ sys.exit(1)
14
+ print(f" OK: {msg}")
15
+
16
+
17
+ def main() -> None:
18
+ print("TenderIQ Smoke Test")
19
+ print("=" * 50)
20
+
21
+ # 1. Core imports
22
+ print("\n1. Core module imports")
23
+ from core import config, schemas, prompts
24
+ from core.llm_client import LLM, LLMUnavailable
25
+ from core.pdf_utils import extract_pages, is_text_pdf
26
+ from core.ocr_pipeline import extract_document, ExtractedPage
27
+ from core.chunker import chunk_tender, chunk_bidder
28
+ from core.schemas import Criterion, Verdict, Evidence
29
+ from core import audit
30
+ from core.fallback import load_criteria, load_evaluation
31
+ check(True, "All core modules import without error")
32
+
33
+ # 2. Config
34
+ print("\n2. Config")
35
+ check(config.MODEL_VERSION.startswith("deepseek-chat"), "MODEL_VERSION set")
36
+ check(config.CONFIDENCE_HIGH == 0.80, "CONFIDENCE_HIGH = 0.80")
37
+ check(config.CONFIDENCE_REVIEW == 0.55, "CONFIDENCE_REVIEW = 0.55")
38
+
39
+ # 3. Schemas
40
+ print("\n3. Schemas")
41
+ c = Criterion(**{
42
+ "id": "C1", "title": "Turnover", "category": "financial",
43
+ "mandatory": True, "description": "test",
44
+ "rule": {"type": "numeric_threshold", "field": "t", "operator": ">=",
45
+ "value": 50000000, "unit": "INR"},
46
+ "query_hints": ["turnover"], "source_page": 3, "source_clause": "3.2(a)",
47
+ })
48
+ check(c.mandatory is True, "Criterion schema validates")
49
+
50
+ v = Verdict(bidder_id="b", criterion_id="C1", verdict="eligible")
51
+ check(v.verdict_id.startswith("V-"), "Verdict auto-generates verdict_id")
52
+ check(v.review_status == "pending", "Verdict defaults to pending")
53
+
54
+ # 4. Mock data files
55
+ print("\n4. Mock data files")
56
+ from core.config import DATA_DIR
57
+ tender_pdf = DATA_DIR / "tender" / "crpf_construction_tender.pdf"
58
+ check(tender_pdf.exists(), "Tender PDF exists")
59
+ for bidder in ["bidder_a", "bidder_b", "bidder_c"]:
60
+ bidder_dir = DATA_DIR / "bidders" / bidder
61
+ files = list(bidder_dir.glob("*"))
62
+ files = [f for f in files if not f.name.endswith(".gitkeep")]
63
+ check(len(files) >= 4, f"{bidder} has at least 4 documents")
64
+ scan = DATA_DIR / "bidders" / "bidder_c" / "turnover_certificate_scan.png"
65
+ check(scan.exists(), "Bidder C noisy scan exists")
66
+
67
+ # 5. PDF utils
68
+ print("\n5. PDF utils")
69
+ pages = extract_pages(tender_pdf)
70
+ check(len(pages) >= 3, f"Tender PDF has {len(pages)} pages")
71
+ check(is_text_pdf(tender_pdf), "Tender PDF detected as text_pdf")
72
+ img = __import__("core.pdf_utils", fromlist=["render_page_to_image"]).render_page_to_image(tender_pdf, 1)
73
+ check(img.size[0] > 0, f"Page render returns {img.size} image")
74
+
75
+ # 6. Chunker
76
+ print("\n6. Chunker")
77
+ chunks = chunk_tender(pages, "tender_001")
78
+ check(len(chunks) > 0, f"chunk_tender returns {len(chunks)} chunks")
79
+ check("text" in chunks[0] and "chunk_id" in chunks[0], "Chunk has text and chunk_id")
80
+
81
+ # 7. OCR pipeline
82
+ print("\n7. OCR pipeline")
83
+ fin_pdf = DATA_DIR / "bidders" / "bidder_a" / "audited_financials.pdf"
84
+ ep = extract_document(fin_pdf)
85
+ check(len(ep) > 0, f"extract_document returns {len(ep)} pages")
86
+ check(ep[0].source_type == "text_pdf", "Typed PDF uses Tier 1")
87
+ check(ep[0].confidence == 1.0, "Typed PDF confidence = 1.0")
88
+
89
+ ep_scan = extract_document(scan)
90
+ check(len(ep_scan) == 1, "Noisy scan returns 1 page")
91
+ check(ep_scan[0].source_type in ("text_pdf", "tesseract", "vision_llm"),
92
+ f"Scan source_type = {ep_scan[0].source_type}")
93
+
94
+ # 8. Fallback
95
+ print("\n8. Fallback")
96
+ criteria = load_criteria()
97
+ check(len(criteria) == 5, f"load_criteria returns {len(criteria)} criteria")
98
+ check(criteria[0].id == "C1", "First criterion is C1")
99
+ mandatory_count = sum(1 for c in criteria if c.mandatory)
100
+ check(mandatory_count == 4, f"{mandatory_count} mandatory criteria")
101
+ optional_count = sum(1 for c in criteria if not c.mandatory)
102
+ check(optional_count == 1, f"{optional_count} optional criterion (C5)")
103
+
104
+ va = load_evaluation("bidder_a", "C1")
105
+ check(va.verdict == "eligible", f"Bidder A C1 = {va.verdict}")
106
+ vb = load_evaluation("bidder_b", "C1")
107
+ check(vb.verdict == "not_eligible", f"Bidder B C1 = {vb.verdict}")
108
+ vc = load_evaluation("bidder_c", "C1")
109
+ check(vc.verdict == "needs_review", f"Bidder C C1 = {vc.verdict}")
110
+
111
+ # 9. Audit
112
+ print("\n9. Audit")
113
+ rid = audit.log("smoke_test", actor="smoke_test")
114
+ check(isinstance(rid, int) and rid > 0, f"audit.log returns row id {rid}")
115
+ rows = audit.query({"action": "smoke_test"})
116
+ check(len(rows) >= 1, "audit.query filters by action")
117
+
118
+ # 10. Evaluator threshold logic
119
+ print("\n10. Evaluator threshold logic")
120
+ from core.evaluator import _apply_thresholds, _combined_confidence
121
+ check(_apply_thresholds("eligible", 0.9) == "eligible", "eligible@0.9 stays eligible")
122
+ check(_apply_thresholds("not_eligible", 0.9) == "not_eligible", "not_eligible@0.9 stays")
123
+ check(_apply_thresholds("not_eligible", 0.6) == "needs_review", "not_eligible@0.6 -> needs_review")
124
+ check(_apply_thresholds("eligible", 0.4) == "needs_review", "eligible@0.4 -> needs_review")
125
+ check(_combined_confidence(0.9, "text_pdf", None) == 0.9, "text_pdf combined = llm_conf")
126
+ c_vis = _combined_confidence(0.9, "vision_llm", None)
127
+ check(0.8 < c_vis < 0.96, f"vision_llm combined = {c_vis:.3f}")
128
+
129
+ # 11. Precomputed files
130
+ print("\n11. Precomputed JSON files")
131
+ from core.config import PRECOMPUTED_DIR
132
+ check((PRECOMPUTED_DIR / "criteria.json").exists(), "criteria.json exists")
133
+ for bidder in ["bidder_a", "bidder_b", "bidder_c"]:
134
+ check((PRECOMPUTED_DIR / f"eval_{bidder}.json").exists(), f"eval_{bidder}.json exists")
135
+
136
+ print("\n" + "=" * 50)
137
+ print("All checks passed. Smoke test: SUCCESS")
138
+ print("=" * 50)
139
+
140
+
141
+ if __name__ == "__main__":
142
+ main()