JaydeepR Claude Sonnet 4.6 commited on
Commit
9bb7eba
·
1 Parent(s): 661eb14

Step 2: mock data — tender PDF, bidder docs, noisy scan PNG

Browse files

Implements specs/11_mock_data.md. Generates crpf_construction_tender.pdf with
5 criteria (C1-C5), typed PDFs for Bidder A (eligible) and B (ineligible),
and Bidder C docs including a GaussianBlur+noise-degraded turnover_certificate_scan.png
for the OCR demo path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

data/bidders/bidder_a/audited_financials.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78608f59f8c92e1ab2c4e1d307e9dad5fdb2e024a76ca2571fb226895b0c82cb
3
+ size 2338
data/bidders/bidder_a/company_profile.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25d2d4416e926d29167173d59279e3a36d3956c2f9d667219125c12cd2c2922c
3
+ size 1872
data/bidders/bidder_a/gst_certificate.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a15dd28fc14811defec690220f6232d0ca7e07541020a2ad8978962d8b0443d
3
+ size 1939
data/bidders/bidder_a/iso_9001.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb688f99dbc950337a89336a9c97ea5dd7db0027560c15d3504e07344f28747a
3
+ size 2053
data/bidders/bidder_a/project_experience.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed1b2549c2852a3c5e47737f73692ff0b968bde002ca3271d044a3aee78d201d
3
+ size 2453
data/bidders/bidder_b/audited_financials.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23950d4e983fad85523f7e7ac6e5f7ad4cf44ac16edb6b2f77efea59c52dc02c
3
+ size 2317
data/bidders/bidder_b/company_profile.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23682906de7f68e776b650065e555b0ef6abc425bf519a60fdf20f8b872d25b4
3
+ size 1873
data/bidders/bidder_b/gst_certificate.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cc529d98363ea8dac622a93cbf1dda505f0eb2bbed79cd209738d3412c55ed0
3
+ size 1929
data/bidders/bidder_b/iso_9001.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b06a8a387351c147224d73ae66c177a99ad2b125264dbd29ff313dfade23634
3
+ size 2057
data/bidders/bidder_b/project_experience.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:594337e60cc6daafd24e7ecabe733250ec61669130b021c65e615e06af91f435
3
+ size 2372
data/bidders/bidder_c/company_profile.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a34e24c98b67e59038ca27657336f2f4521fd577ed42a8f14e17c10e6aad1abf
3
+ size 1862
data/bidders/bidder_c/gst_certificate.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfd82955c4f07b3da1de492fac23239de2d8adcd6e486093a1e3770b652e7985
3
+ size 1942
data/bidders/bidder_c/iso_9001.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:262d8d616fc328371af34476101eecea9981490948a2ed8d59cd6443c1753b98
3
+ size 2061
data/bidders/bidder_c/project_experience.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38d5de6e2de322ba97e62ac125b52b0062c553207073c0713a1bcfbd9a481290
3
+ size 2311
data/bidders/bidder_c/turnover_certificate_scan.png ADDED

Git LFS Details

  • SHA256: 273c50b438a2755b5fbda40c34775f1d6db2181089c77643c846ee97e91450ac
  • Pointer size: 132 Bytes
  • Size of remote file: 1.82 MB
data/tender/crpf_construction_tender.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6911a05bbfa8f3a341c61b8cebf28e83e99d8274d98717b28c027b5cd0a09032
3
+ size 5829
scripts/generate_mock_data.py CHANGED
@@ -1 +1,461 @@
1
  """Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
2
+
3
+ import io
4
+ import sys
5
+ from pathlib import Path
6
+
7
+ import numpy as np
8
+ from PIL import Image, ImageFilter
9
+ from reportlab.lib import colors
10
+ from reportlab.lib.pagesizes import A4
11
+ from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
12
+ from reportlab.lib.units import cm
13
+ from reportlab.platypus import (
14
+ Paragraph, SimpleDocTemplate, Spacer, Table, TableStyle, PageBreak
15
+ )
16
+
17
+ BASE_DIR = Path(__file__).resolve().parent.parent
18
+ DATA_DIR = BASE_DIR / "data"
19
+
20
+
21
+ def _doc(path: Path) -> SimpleDocTemplate:
22
+ return SimpleDocTemplate(str(path), pagesize=A4,
23
+ leftMargin=2*cm, rightMargin=2*cm,
24
+ topMargin=2*cm, bottomMargin=2*cm)
25
+
26
+
27
+ def _styles():
28
+ styles = getSampleStyleSheet()
29
+ styles.add(ParagraphStyle(name="Center", alignment=1, fontSize=12,
30
+ spaceAfter=6))
31
+ styles.add(ParagraphStyle(name="Bold14", fontName="Helvetica-Bold",
32
+ fontSize=14, spaceAfter=8))
33
+ styles.add(ParagraphStyle(name="Bold12", fontName="Helvetica-Bold",
34
+ fontSize=12, spaceAfter=6))
35
+ styles.add(ParagraphStyle(name="Body10", fontSize=10, spaceAfter=4,
36
+ leading=14))
37
+ styles.add(ParagraphStyle(name="Clause", fontSize=10, leftIndent=20,
38
+ spaceAfter=10, leading=14))
39
+ return styles
40
+
41
+
42
+ def make_tender_pdf(out_path: Path) -> None:
43
+ doc = _doc(out_path)
44
+ s = _styles()
45
+ story = []
46
+
47
+ story.append(Paragraph("GOVERNMENT OF INDIA", s["Center"]))
48
+ story.append(Paragraph("MINISTRY OF HOME AFFAIRS", s["Center"]))
49
+ story.append(Paragraph("CENTRAL RESERVE POLICE FORCE", s["Bold14"]))
50
+ story.append(Paragraph(
51
+ "TENDER DOCUMENT FOR CONSTRUCTION OF RESIDENTIAL QUARTERS",
52
+ s["Center"]
53
+ ))
54
+ story.append(Paragraph("Tender No: CRPF/CE/2025-26/RQ/001", s["Center"]))
55
+ story.append(Spacer(1, 0.5*cm))
56
+
57
+ story.append(Paragraph("1. INTRODUCTION", s["Bold12"]))
58
+ story.append(Paragraph(
59
+ "The Central Reserve Police Force (CRPF), under the Ministry of Home Affairs, "
60
+ "Government of India, invites sealed tenders from eligible contractors for the "
61
+ "construction of Residential Quarters at CRPF Camp, New Delhi. The work involves "
62
+ "civil construction, internal electrification, plumbing, and allied works.",
63
+ s["Body10"]
64
+ ))
65
+ story.append(Spacer(1, 0.3*cm))
66
+
67
+ story.append(Paragraph("2. SCOPE OF WORK", s["Bold12"]))
68
+ story.append(Paragraph(
69
+ "The scope includes: (a) Construction of Type-III Residential Quarters (G+4) — "
70
+ "24 units; (b) Internal roads, drainage, and compound wall; (c) Water supply and "
71
+ "sanitation infrastructure; (d) Landscaping and external works. Estimated project "
72
+ "value: INR 18 Crore. Completion period: 24 months.",
73
+ s["Body10"]
74
+ ))
75
+ story.append(PageBreak())
76
+
77
+ story.append(Paragraph("3. ELIGIBILITY CRITERIA", s["Bold12"]))
78
+ story.append(Paragraph(
79
+ "Only bidders fulfilling ALL mandatory eligibility criteria listed below shall be "
80
+ "considered for technical evaluation. Bids not meeting mandatory criteria shall be "
81
+ "rejected summarily without further evaluation.",
82
+ s["Body10"]
83
+ ))
84
+ story.append(Spacer(1, 0.3*cm))
85
+
86
+ story.append(Paragraph("3.2 Mandatory and Desirable Criteria", s["Bold12"]))
87
+
88
+ criteria_text = [
89
+ ("3.2(a)", "Financial Capability",
90
+ "The bidder shall have a minimum average annual turnover of INR 5 Crore "
91
+ "(Rupees Five Crore only) during the last three financial years (2022-23, "
92
+ "2023-24, 2024-25), as certified by a Chartered Accountant. Documentary "
93
+ "evidence in the form of audited balance sheets, profit & loss account, and "
94
+ "CA certificate shall be submitted. [MANDATORY]"),
95
+ ("3.2(b)", "Technical Experience",
96
+ "The bidder must have successfully completed at least three (3) similar "
97
+ "construction projects of value not less than INR 1 Crore each in the last "
98
+ "five (5) financial years. Completion certificates from clients shall be "
99
+ "submitted along with work orders. [MANDATORY]"),
100
+ ("3.2(c)", "GST Registration",
101
+ "The bidder shall possess a valid Goods and Services Tax (GST) registration "
102
+ "certificate. The GSTIN must be active as on the date of submission. A copy "
103
+ "of the GST registration certificate shall be enclosed with the bid. "
104
+ "[MANDATORY]"),
105
+ ("3.2(d)", "Quality Certification",
106
+ "The bidder shall hold a valid ISO 9001:2015 Quality Management System "
107
+ "certification issued by an accredited certification body, valid as on the "
108
+ "date of bid submission. Copy of the certificate shall be submitted. "
109
+ "[MANDATORY]"),
110
+ ("3.2(e)", "Paramilitary Experience",
111
+ "Preferably, the bidder may have prior experience with construction or "
112
+ "maintenance of paramilitary or defence infrastructure. This is a desirable "
113
+ "criterion and shall not affect mandatory eligibility. Supporting documents "
114
+ "may be submitted for additional credit during evaluation. [DESIRABLE]"),
115
+ ]
116
+
117
+ for clause, title, text in criteria_text:
118
+ story.append(Paragraph(f"<b>{clause} {title}</b>", s["Body10"]))
119
+ story.append(Paragraph(text, s["Clause"]))
120
+
121
+ story.append(PageBreak())
122
+
123
+ story.append(Paragraph("4. SUBMISSION PROCEDURE", s["Bold12"]))
124
+ story.append(Paragraph(
125
+ "Bids shall be submitted in two envelopes: Technical Bid and Financial Bid. "
126
+ "Last date of submission: 30-06-2026. Address for submission: The Inspector "
127
+ "General (Works), CRPF Group Centre, New Delhi – 110077. "
128
+ "EMD of INR 36 Lakh (2% of estimated cost) to be deposited via DD/BG.",
129
+ s["Body10"]
130
+ ))
131
+ story.append(Spacer(1, 0.5*cm))
132
+
133
+ story.append(Paragraph("5. EVALUATION METHODOLOGY", s["Bold12"]))
134
+ story.append(Paragraph(
135
+ "Evaluation shall proceed in two stages: (i) Technical Evaluation — bidders "
136
+ "meeting all mandatory criteria in 3.2 shall be declared technically qualified; "
137
+ "(ii) Financial Evaluation — lowest L1 bid among technically qualified bidders "
138
+ "shall be recommended. Desirable criteria (3.2(e)) may be used for tie-breaking.",
139
+ s["Body10"]
140
+ ))
141
+ story.append(PageBreak())
142
+
143
+ story.append(Paragraph("6. ANNEXURES", s["Bold12"]))
144
+ story.append(Paragraph("Annexure A — Bid Form", s["Body10"]))
145
+ story.append(Paragraph("Annexure B — Declaration of Non-Blacklisting", s["Body10"]))
146
+ story.append(Paragraph("Annexure C — CA Certificate Format (Turnover)", s["Body10"]))
147
+ story.append(Paragraph("Annexure D — Project Completion Certificate Format", s["Body10"]))
148
+
149
+ doc.build(story)
150
+
151
+
152
+ def _simple_pdf(out_path: Path, title: str, paragraphs: list[str],
153
+ table_data: list[list] | None = None) -> None:
154
+ doc = _doc(out_path)
155
+ s = _styles()
156
+ story = []
157
+ story.append(Paragraph(title, s["Bold14"]))
158
+ story.append(Spacer(1, 0.3*cm))
159
+ for para in paragraphs:
160
+ story.append(Paragraph(para, s["Body10"]))
161
+ if table_data:
162
+ story.append(Spacer(1, 0.3*cm))
163
+ tbl = Table(table_data, hAlign="LEFT")
164
+ tbl.setStyle(TableStyle([
165
+ ("BACKGROUND", (0, 0), (-1, 0), colors.lightgrey),
166
+ ("GRID", (0, 0), (-1, -1), 0.5, colors.black),
167
+ ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
168
+ ("FONTSIZE", (0, 0), (-1, -1), 9),
169
+ ("TOPPADDING", (0, 0), (-1, -1), 4),
170
+ ("BOTTOMPADDING", (0, 0), (-1, -1), 4),
171
+ ]))
172
+ story.append(tbl)
173
+ doc.build(story)
174
+
175
+
176
+ def make_company_profile(out_path: Path, name: str, gstin: str, reg_year: int,
177
+ iso: bool = True, extra_lines: list[str] | None = None) -> None:
178
+ paras = [
179
+ f"<b>Company Name:</b> {name}",
180
+ f"<b>GSTIN:</b> {gstin}",
181
+ f"<b>Year of Registration:</b> {reg_year}",
182
+ f"<b>Nature of Business:</b> Civil Construction and Infrastructure Development",
183
+ f"<b>ISO 9001:2015 Certified:</b> {'Yes' if iso else 'No'}",
184
+ "<b>Registered Office:</b> 42, Industrial Area, Phase II, India",
185
+ ]
186
+ if extra_lines:
187
+ paras.extend(extra_lines)
188
+ _simple_pdf(out_path, f"Company Profile — {name}", paras)
189
+
190
+
191
+ def make_financials(out_path: Path, company: str,
192
+ rows: list[tuple[str, str, int]], ca_name: str,
193
+ ca_no: str) -> None:
194
+ table_data = [["Financial Year", "Annual Turnover (INR)", "Words"]]
195
+ for fy, words, amount in rows:
196
+ table_data.append([fy, f"{amount:,}", words])
197
+ avg = sum(r[2] for r in rows) // len(rows)
198
+ table_data.append(["Average (3 years)", f"{avg:,}", ""])
199
+
200
+ paras = [
201
+ f"<b>Company:</b> {company}",
202
+ "The following statement of annual turnover has been prepared from the "
203
+ "audited accounts and is certified to be true and correct.",
204
+ ]
205
+ paras.append(f"<b>Certified by:</b> {ca_name}, Chartered Accountant, M. No. {ca_no}")
206
+ paras.append("<b>UDIN:</b> 26123456AAAAA0001")
207
+ paras.append("<b>Place:</b> Mumbai &nbsp;&nbsp; <b>Date:</b> 01-04-2026")
208
+ _simple_pdf(out_path,
209
+ "Audited Financial Statement — Annual Turnover Certificate",
210
+ paras, table_data)
211
+
212
+
213
+ def make_project_experience(out_path: Path, company: str,
214
+ projects: list[dict]) -> None:
215
+ table_data = [["#", "Project Name", "Client", "Value (INR Cr)", "Year", "Status"]]
216
+ for i, p in enumerate(projects, 1):
217
+ table_data.append([
218
+ str(i), p["name"], p["client"],
219
+ str(p["value"]), str(p["year"]), p.get("status", "Completed")
220
+ ])
221
+ paras = [
222
+ f"<b>Company:</b> {company}",
223
+ "The following construction projects have been completed by the organization "
224
+ "in the last five financial years (2020–2025). Completion certificates are "
225
+ "enclosed separately.",
226
+ ]
227
+ _simple_pdf(out_path, "Project Experience Certificate", paras, table_data)
228
+
229
+
230
+ def make_gst_certificate(out_path: Path, gstin: str, company: str,
231
+ valid_through: str) -> None:
232
+ paras = [
233
+ "<b>GOODS AND SERVICES TAX REGISTRATION CERTIFICATE</b>",
234
+ f"<b>Legal Name of Business:</b> {company}",
235
+ f"<b>GSTIN:</b> {gstin}",
236
+ f"<b>Date of Registration:</b> 01-07-2017",
237
+ f"<b>Valid Through:</b> {valid_through}",
238
+ f"<b>Registration Status:</b> ACTIVE",
239
+ f"<b>Type of Registration:</b> Regular",
240
+ f"<b>Issuing Authority:</b> Assistant Commissioner CGST, Mumbai",
241
+ ]
242
+ _simple_pdf(out_path, "GST Registration Certificate", paras)
243
+
244
+
245
+ def make_iso_certificate(out_path: Path, cert_no: str, company: str,
246
+ valid_through: str, issuer: str) -> None:
247
+ paras = [
248
+ "<b>ISO 9001:2015 QUALITY MANAGEMENT SYSTEM CERTIFICATE</b>",
249
+ f"<b>Certificate Number:</b> {cert_no}",
250
+ f"<b>This certifies that:</b> {company}",
251
+ "<b>Scope:</b> Design and Construction of Civil Infrastructure including "
252
+ "Residential, Commercial and Industrial Buildings",
253
+ f"<b>Valid Through:</b> {valid_through}",
254
+ f"<b>Issuing Body:</b> {issuer}",
255
+ "<b>Accreditation:</b> National Accreditation Board for Certification Bodies (NABCB)",
256
+ "<b>This certificate is issued in accordance with ISO 9001:2015 standard.</b>",
257
+ ]
258
+ _simple_pdf(out_path, "ISO 9001:2015 Certificate", paras)
259
+
260
+
261
+ def _render_ca_cert_to_pil(company: str, gstin: str, avg_amount: int,
262
+ avg_words: str) -> Image.Image:
263
+ """Render a CA turnover certificate PDF page to PIL image for degradation."""
264
+ import fitz # PyMuPDF
265
+
266
+ buf = io.BytesIO()
267
+ doc = _doc(Path("/tmp/dummy.pdf")) # path unused for in-memory
268
+ s = _styles()
269
+ story = [
270
+ Paragraph("CHARTERED ACCOUNTANT'S CERTIFICATE", s["Bold14"]),
271
+ Paragraph("(As per Annexure C of Tender No: CRPF/CE/2025-26/RQ/001)", s["Body10"]),
272
+ Spacer(1, 0.5*cm),
273
+ Paragraph(
274
+ f"This is to certify that M/s {company} (GSTIN: {gstin}) is a registered "
275
+ "entity engaged in civil construction activities. Based on the audited "
276
+ "financial statements and books of accounts duly certified under applicable "
277
+ "provisions of the Companies Act, 2013:",
278
+ s["Body10"]
279
+ ),
280
+ Spacer(1, 0.3*cm),
281
+ Paragraph(
282
+ f"The average annual turnover of the firm for the three financial years "
283
+ f"2022-23, 2023-24, and 2024-25 is <b>INR {avg_amount:,} ({avg_words} only)</b>.",
284
+ s["Body10"]
285
+ ),
286
+ Spacer(1, 0.3*cm),
287
+ ]
288
+
289
+ table_data = [
290
+ ["Financial Year", "Annual Turnover (INR)", "In Words"],
291
+ ["2022-23", "4,80,00,000", "Four Crore Eighty Lakh"],
292
+ ["2023-24", "5,40,00,000", "Five Crore Forty Lakh"],
293
+ ["2024-25", "6,00,00,000", "Six Crore"],
294
+ [f"Average (3 years)", f"{avg_amount:,}", avg_words],
295
+ ]
296
+ tbl = Table(table_data)
297
+ tbl.setStyle(TableStyle([
298
+ ("BACKGROUND", (0, 0), (-1, 0), colors.lightgrey),
299
+ ("GRID", (0, 0), (-1, -1), 0.5, colors.black),
300
+ ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
301
+ ("FONTSIZE", (0, 0), (-1, -1), 9),
302
+ ("TOPPADDING", (0, 0), (-1, -1), 4),
303
+ ("BOTTOMPADDING", (0, 0), (-1, -1), 4),
304
+ ]))
305
+ story.append(tbl)
306
+ story.extend([
307
+ Spacer(1, 0.5*cm),
308
+ Paragraph("Certified by:", s["Body10"]),
309
+ Paragraph("<b>CA Vikram Shah</b>", s["Body10"]),
310
+ Paragraph("M. No. 098765", s["Body10"]),
311
+ Paragraph("FRN: 100001W", s["Body10"]),
312
+ Paragraph("Place: Ahmedabad &nbsp;&nbsp; Date: 05-04-2026", s["Body10"]),
313
+ Paragraph("UDIN: 26098765BBBBB0002", s["Body10"]),
314
+ ])
315
+
316
+ buf = io.BytesIO()
317
+ pdf_doc = SimpleDocTemplate(buf, pagesize=A4,
318
+ leftMargin=2*cm, rightMargin=2*cm,
319
+ topMargin=2*cm, bottomMargin=2*cm)
320
+ pdf_doc.build(story)
321
+ buf.seek(0)
322
+
323
+ fitz_doc = fitz.open(stream=buf.read(), filetype="pdf")
324
+ page = fitz_doc[0]
325
+ mat = fitz.Matrix(150/72, 150/72)
326
+ pix = page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)
327
+ img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
328
+ fitz_doc.close()
329
+ return img
330
+
331
+
332
+ def make_noisy_scan(out_path: Path) -> None:
333
+ img = _render_ca_cert_to_pil(
334
+ company="Shree Constructions & Services",
335
+ gstin="24AABCC9012H1Z1",
336
+ avg_amount=5_40_00_000,
337
+ avg_words="Five Crore Forty Lakh",
338
+ )
339
+
340
+ # Apply degradation
341
+ img = img.filter(ImageFilter.GaussianBlur(radius=1.5))
342
+
343
+ arr = np.array(img, dtype=np.uint8)
344
+ rng = np.random.default_rng(seed=42)
345
+ noise_mask = rng.random(arr.shape[:2])
346
+ arr[noise_mask < 0.025] = 0
347
+ arr[noise_mask > 0.975] = 255
348
+ img = Image.fromarray(arr)
349
+
350
+ img = img.rotate(-2, expand=True, fillcolor=(255, 255, 255))
351
+
352
+ jpeg_buf = io.BytesIO()
353
+ img.convert("RGB").save(jpeg_buf, format="JPEG", quality=40)
354
+ jpeg_buf.seek(0)
355
+ img = Image.open(jpeg_buf).copy()
356
+
357
+ img.save(str(out_path), format="PNG")
358
+
359
+
360
+ def main() -> None:
361
+ # Ensure output dirs exist
362
+ for d in [
363
+ DATA_DIR / "tender",
364
+ DATA_DIR / "bidders" / "bidder_a",
365
+ DATA_DIR / "bidders" / "bidder_b",
366
+ DATA_DIR / "bidders" / "bidder_c",
367
+ DATA_DIR / "precomputed",
368
+ ]:
369
+ d.mkdir(parents=True, exist_ok=True)
370
+
371
+ # Tender
372
+ make_tender_pdf(DATA_DIR / "tender" / "crpf_construction_tender.pdf")
373
+
374
+ # ── Bidder A ─────────────────────────────────────────────────────────────
375
+ a = DATA_DIR / "bidders" / "bidder_a"
376
+ make_company_profile(a / "company_profile.pdf",
377
+ "Apex Constructions Pvt. Ltd.", "27AABCA1234F1Z5", 2010)
378
+ make_financials(a / "audited_financials.pdf",
379
+ "Apex Constructions Pvt. Ltd.",
380
+ [
381
+ ("2022-23", "Five Crore Eighty Lakh", 5_80_00_000),
382
+ ("2023-24", "Six Crore Twenty Lakh", 6_20_00_000),
383
+ ("2024-25", "Seven Crore Ten Lakh", 7_10_00_000),
384
+ ],
385
+ ca_name="CA Ramesh Kumar", ca_no="123456")
386
+ make_project_experience(a / "project_experience.pdf",
387
+ "Apex Constructions Pvt. Ltd.",
388
+ [
389
+ {"name": "Staff Quarters Block A", "client": "PWD Delhi",
390
+ "value": 2.5, "year": 2021},
391
+ {"name": "Office Complex Phase 1", "client": "CPWD Mumbai",
392
+ "value": 3.2, "year": 2022},
393
+ {"name": "Residential Complex", "client": "NBCC Ltd",
394
+ "value": 4.1, "year": 2023},
395
+ {"name": "Barracks Construction", "client": "CRPF Camp Pune",
396
+ "value": 3.5, "year": 2024},
397
+ {"name": "Commercial Warehouse", "client": "DDA",
398
+ "value": 1.8, "year": 2025},
399
+ ])
400
+ make_gst_certificate(a / "gst_certificate.pdf", "27AABCA1234F1Z5",
401
+ "Apex Constructions Pvt. Ltd.", "31-03-2027")
402
+ make_iso_certificate(a / "iso_9001.pdf", "ISO-2021-9001-APEX",
403
+ "Apex Constructions Pvt. Ltd.", "15-06-2027",
404
+ "Bureau Veritas Certification India Pvt. Ltd.")
405
+
406
+ # ── Bidder B ─────────────────────────────────────────────────────────────
407
+ b = DATA_DIR / "bidders" / "bidder_b"
408
+ make_company_profile(b / "company_profile.pdf",
409
+ "BuildRight Enterprises", "29AABCB5678G1Z3", 2015)
410
+ make_financials(b / "audited_financials.pdf",
411
+ "BuildRight Enterprises",
412
+ [
413
+ ("2022-23", "One Crore Twenty Lakh", 1_20_00_000),
414
+ ("2023-24", "One Crore Fifty Lakh", 1_50_00_000),
415
+ ("2024-25", "One Crore Eighty Lakh", 1_80_00_000),
416
+ ],
417
+ ca_name="CA Suresh Patel", ca_no="654321")
418
+ make_project_experience(b / "project_experience.pdf",
419
+ "BuildRight Enterprises",
420
+ [
421
+ {"name": "Residential Quarters", "client": "Municipal Corp",
422
+ "value": 1.1, "year": 2022},
423
+ {"name": "School Building Renovation", "client": "KVS",
424
+ "value": 1.3, "year": 2023},
425
+ {"name": "Community Hall", "client": "NDMC",
426
+ "value": 1.2, "year": 2024},
427
+ {"name": "Warehouse Shed", "client": "FCI",
428
+ "value": 1.0, "year": 2025},
429
+ ])
430
+ make_gst_certificate(b / "gst_certificate.pdf", "29AABCB5678G1Z3",
431
+ "BuildRight Enterprises", "31-03-2027")
432
+ make_iso_certificate(b / "iso_9001.pdf", "ISO-2022-9001-BR",
433
+ "BuildRight Enterprises", "20-08-2027",
434
+ "TUV SUD South Asia Pvt. Ltd.")
435
+
436
+ # ── Bidder C ─────────────────────────────────────────────────────────────
437
+ c = DATA_DIR / "bidders" / "bidder_c"
438
+ make_company_profile(c / "company_profile.pdf",
439
+ "Shree Constructions & Services", "24AABCC9012H1Z1", 2012)
440
+ make_project_experience(c / "project_experience.pdf",
441
+ "Shree Constructions & Services",
442
+ [
443
+ {"name": "Housing Complex Phase 1", "client": "GIDC",
444
+ "value": 1.2, "year": 2023},
445
+ {"name": "Commercial Plaza", "client": "Ahmedabad MC",
446
+ "value": 2.1, "year": 2024},
447
+ {"name": "Road & Drainage Works", "client": "NHAI",
448
+ "value": 1.5, "year": 2025},
449
+ ])
450
+ make_gst_certificate(c / "gst_certificate.pdf", "24AABCC9012H1Z1",
451
+ "Shree Constructions & Services", "31-03-2027")
452
+ make_iso_certificate(c / "iso_9001.pdf", "ISO-2023-9001-SCS",
453
+ "Shree Constructions & Services", "10-09-2027",
454
+ "DNV Business Assurance India Pvt. Ltd.")
455
+ make_noisy_scan(c / "turnover_certificate_scan.png")
456
+
457
+ print("Mock data generated successfully.")
458
+
459
+
460
+ if __name__ == "__main__":
461
+ main()
specs/11_mock_data.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec 11 — Mock Data Generation
2
+
3
+ **Step:** 2 of 15
4
+ **Time budget:** ~25 min
5
+ **Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence (~50–65%).
6
+
7
+ ---
8
+
9
+ ## Goal
10
+
11
+ `scripts/generate_mock_data.py` is a single deterministic script that produces:
12
+ 1. One tender PDF (`data/tender/crpf_construction_tender.pdf`)
13
+ 2. Five PDFs for Bidder A (clearly eligible)
14
+ 3. Five PDFs for Bidder B (clearly ineligible — turnover too low)
15
+ 4. Four PDFs + one noisy scan PNG for Bidder C (needs review)
16
+
17
+ All files are entirely synthetic and self-contained — no external assets required. The script must run in under 30 seconds.
18
+
19
+ ---
20
+
21
+ ## Dependencies
22
+
23
+ - `reportlab` — PDF generation
24
+ - `Pillow` — image manipulation
25
+ - `numpy` — salt-and-pepper noise
26
+
27
+ ---
28
+
29
+ ## Output Files
30
+
31
+ ```
32
+ data/
33
+ tender/
34
+ crpf_construction_tender.pdf
35
+ bidders/
36
+ bidder_a/
37
+ company_profile.pdf
38
+ audited_financials.pdf
39
+ project_experience.pdf
40
+ gst_certificate.pdf
41
+ iso_9001.pdf
42
+ bidder_b/
43
+ company_profile.pdf
44
+ audited_financials.pdf
45
+ project_experience.pdf
46
+ gst_certificate.pdf
47
+ iso_9001.pdf
48
+ bidder_c/
49
+ company_profile.pdf
50
+ project_experience.pdf
51
+ gst_certificate.pdf
52
+ iso_9001.pdf
53
+ turnover_certificate_scan.png
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Tender PDF — `crpf_construction_tender.pdf`
59
+
60
+ `reportlab` SimpleDocTemplate, 5–6 pages with formal government tender language.
61
+
62
+ ### Sections
63
+
64
+ 1. **Introduction** — "Central Reserve Police Force, Ministry of Home Affairs, Government of India. Tender for Construction of Residential Quarters."
65
+ 2. **Scope of Work** — brief description of construction project.
66
+ 3. **Eligibility Criteria** — Section 3.2, contains five criteria (see table below).
67
+ 4. **Submission Procedure** — dates, contact details.
68
+ 5. **Evaluation Methodology** — how bids will be scored.
69
+ 6. **Annexures** — supporting forms.
70
+
71
+ ### Five Criteria (exact text in Section 3.2)
72
+
73
+ | ID | Clause | Verbatim Text | Mandatory | Category |
74
+ |---|---|---|---|---|
75
+ | C1 | 3.2(a) | "The bidder shall have a minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years (2022-23, 2023-24, 2024-25), as certified by a Chartered Accountant." | Yes | financial |
76
+ | C2 | 3.2(b) | "The bidder must have successfully completed at least three (3) similar construction projects of value not less than INR 1 Crore each in the last five (5) financial years. Completion certificates from clients shall be submitted." | Yes | technical |
77
+ | C3 | 3.2(c) | "The bidder shall possess a valid Goods and Services Tax (GST) registration certificate. The GSTIN must be active as on the date of submission." | Yes | compliance |
78
+ | C4 | 3.2(d) | "The bidder shall hold a valid ISO 9001:2015 Quality Management System certification issued by an accredited certification body, valid as on the date of bid submission." | Yes | compliance |
79
+ | C5 | 3.2(e) | "Preferably, the bidder may have prior experience with construction or maintenance of paramilitary or defence infrastructure. This is a desirable criterion and shall not affect mandatory eligibility." | No | technical |
80
+
81
+ C5 uses "preferably" and "desirable" → tests the mandatory-vs-optional classifier.
82
+
83
+ ---
84
+
85
+ ## Bidder A — Clearly Eligible
86
+
87
+ ### `company_profile.pdf`
88
+ - Company: "Apex Constructions Pvt. Ltd."
89
+ - GSTIN: 27AABCA1234F1Z5
90
+ - Registered: 2010
91
+ - ISO 9001:2015 certified: Yes
92
+
93
+ ### `audited_financials.pdf`
94
+ - FY 2022-23: Annual Turnover INR 5,80,00,000 (Rupees Five Crore Eighty Lakh)
95
+ - FY 2023-24: Annual Turnover INR 6,20,00,000 (Rupees Six Crore Twenty Lakh)
96
+ - FY 2024-25: Annual Turnover INR 7,10,00,000 (Rupees Seven Crore Ten Lakh)
97
+ - Average: INR 6,36,66,667 — exceeds INR 5 Crore threshold
98
+ - Certified by: CA Ramesh Kumar, M. No. 123456
99
+
100
+ ### `project_experience.pdf`
101
+ - 5 projects listed (2020–2025), each ≥ INR 1 Crore
102
+ - Includes one CRPF project (2023): "Construction of barracks, CRPF Camp, Pune, INR 3.5 Crore"
103
+
104
+ ### `gst_certificate.pdf`
105
+ - GSTIN: 27AABCA1234F1Z5
106
+ - Valid through: 31-03-2027
107
+ - Status: Active
108
+
109
+ ### `iso_9001.pdf`
110
+ - Certificate No: ISO-2021-9001-APEX
111
+ - Valid through: 15-06-2027
112
+ - Issued by: Bureau Veritas
113
+
114
+ ---
115
+
116
+ ## Bidder B — Clearly Ineligible (turnover too low)
117
+
118
+ Same structure as Bidder A, but financials are below threshold.
119
+
120
+ ### `company_profile.pdf`
121
+ - Company: "BuildRight Enterprises"
122
+ - GSTIN: 29AABCB5678G1Z3
123
+
124
+ ### `audited_financials.pdf`
125
+ - FY 2022-23: Annual Turnover INR 1,20,00,000 (Rupees One Crore Twenty Lakh)
126
+ - FY 2023-24: Annual Turnover INR 1,50,00,000 (Rupees One Crore Fifty Lakh)
127
+ - FY 2024-25: Annual Turnover INR 1,80,00,000 (Rupees One Crore Eighty Lakh)
128
+ - Average: INR 1,50,00,000 — **below** INR 5 Crore threshold
129
+ - Certified by: CA Suresh Patel, M. No. 654321
130
+
131
+ ### `project_experience.pdf`
132
+ - 4 projects listed (2021–2025), each ≥ INR 1 Crore — passes C2
133
+
134
+ ### `gst_certificate.pdf`
135
+ - GSTIN: 29AABCB5678G1Z3, valid through 2027, Active
136
+
137
+ ### `iso_9001.pdf`
138
+ - Certificate No: ISO-2022-9001-BR
139
+ - Valid through: 20-08-2027
140
+
141
+ ---
142
+
143
+ ## Bidder C — Needs Review (scanned turnover certificate)
144
+
145
+ No typed `audited_financials.pdf`. Instead: a deliberately noisy scan PNG.
146
+
147
+ ### `company_profile.pdf`
148
+ - Company: "Shree Constructions & Services"
149
+ - GSTIN: 24AABCC9012H1Z1
150
+
151
+ ### `project_experience.pdf`
152
+ - Exactly 3 projects (borderline meets count threshold for C2)
153
+ - Values: INR 1.2 Cr, INR 1.5 Cr, INR 2.1 Cr
154
+
155
+ ### `gst_certificate.pdf`
156
+ - GSTIN: 24AABCC9012H1Z1, valid through 2027, Active
157
+
158
+ ### `iso_9001.pdf`
159
+ - Certificate No: ISO-2023-9001-SCS
160
+ - Valid through: 10-09-2027
161
+
162
+ ### `turnover_certificate_scan.png` — noisy scan generation
163
+
164
+ This is the OCR demo centerpiece. Steps:
165
+
166
+ 1. Render a `reportlab` page to an in-memory PDF with a CA's turnover certificate:
167
+ - "This is to certify that M/s Shree Constructions & Services ... average annual turnover of INR 5,40,00,000 (Rupees Five Crore Forty Lakh only) for the financial years 2022-23, 2023-24, and 2024-25."
168
+ - Include year-wise breakdown table.
169
+ 2. Convert that PDF page to a PIL Image at 150 DPI using `fitz` (PyMuPDF).
170
+ 3. Apply degradation:
171
+ - `ImageFilter.GaussianBlur(radius=1.5)`
172
+ - Salt-and-pepper noise via numpy: randomly set ~5% of pixels to 0 or 255
173
+ - `image.rotate(-2, expand=True, fillcolor=(255,255,255))`
174
+ - Re-save with JPEG compression at quality=40 then reload as PNG
175
+ 4. Save as `data/bidders/bidder_c/turnover_certificate_scan.png`
176
+
177
+ **Expected outcome:** Tesseract reads this at mean confidence ~50–65% → triggers Tier-3 vision LLM. The turnover figure (INR 5,40,00,000) is present but partially degraded, making it a realistic "needs human review" case given combined-confidence rules.
178
+
179
+ ---
180
+
181
+ ## Script Design
182
+
183
+ ```python
184
+ # scripts/generate_mock_data.py
185
+
186
+ def make_tender_pdf(out_path: Path) -> None: ...
187
+ def make_company_profile(out_path: Path, name: str, gstin: str, year: int) -> None: ...
188
+ def make_financials(out_path: Path, rows: list[tuple[str, str, int]]) -> None: ...
189
+ def make_project_experience(out_path: Path, projects: list[dict]) -> None: ...
190
+ def make_gst_certificate(out_path: Path, gstin: str, valid_through: str) -> None: ...
191
+ def make_iso_certificate(out_path: Path, cert_no: str, valid_through: str, company: str) -> None: ...
192
+ def make_noisy_scan(out_path: Path) -> None: ...
193
+
194
+ if __name__ == "__main__":
195
+ # Ensure output dirs exist
196
+ # Generate all files
197
+ print("Mock data generated successfully.")
198
+ ```
199
+
200
+ Each helper creates one PDF/PNG. The script is idempotent (re-running overwrites files). No command-line arguments needed.
201
+
202
+ ---
203
+
204
+ ## Acceptance Criteria
205
+
206
+ 1. Running `python scripts/generate_mock_data.py` exits 0 and prints "Mock data generated successfully."
207
+ 2. All 16 files listed above exist after the run.
208
+ 3. Each PDF opens in a viewer without errors and contains the text described.
209
+ 4. `turnover_certificate_scan.png` is visibly degraded (blurry, rotated, noisy).
210
+ 5. Running `pytesseract.image_to_data(Image.open("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a dataframe where the filtered mean confidence is between 30 and 70 (i.e., low enough to trigger Tier 3).
211
+ 6. Script completes in under 30 seconds on any modern machine.