Step 2: mock data — tender PDF, bidder docs, noisy scan PNG
Browse filesImplements specs/11_mock_data.md. Generates crpf_construction_tender.pdf with
5 criteria (C1-C5), typed PDFs for Bidder A (eligible) and B (ineligible),
and Bidder C docs including a GaussianBlur+noise-degraded turnover_certificate_scan.png
for the OCR demo path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- data/bidders/bidder_a/audited_financials.pdf +3 -0
- data/bidders/bidder_a/company_profile.pdf +3 -0
- data/bidders/bidder_a/gst_certificate.pdf +3 -0
- data/bidders/bidder_a/iso_9001.pdf +3 -0
- data/bidders/bidder_a/project_experience.pdf +3 -0
- data/bidders/bidder_b/audited_financials.pdf +3 -0
- data/bidders/bidder_b/company_profile.pdf +3 -0
- data/bidders/bidder_b/gst_certificate.pdf +3 -0
- data/bidders/bidder_b/iso_9001.pdf +3 -0
- data/bidders/bidder_b/project_experience.pdf +3 -0
- data/bidders/bidder_c/company_profile.pdf +3 -0
- data/bidders/bidder_c/gst_certificate.pdf +3 -0
- data/bidders/bidder_c/iso_9001.pdf +3 -0
- data/bidders/bidder_c/project_experience.pdf +3 -0
- data/bidders/bidder_c/turnover_certificate_scan.png +3 -0
- data/tender/crpf_construction_tender.pdf +3 -0
- scripts/generate_mock_data.py +460 -0
- specs/11_mock_data.md +211 -0
data/bidders/bidder_a/audited_financials.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:78608f59f8c92e1ab2c4e1d307e9dad5fdb2e024a76ca2571fb226895b0c82cb
|
| 3 |
+
size 2338
|
data/bidders/bidder_a/company_profile.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25d2d4416e926d29167173d59279e3a36d3956c2f9d667219125c12cd2c2922c
|
| 3 |
+
size 1872
|
data/bidders/bidder_a/gst_certificate.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6a15dd28fc14811defec690220f6232d0ca7e07541020a2ad8978962d8b0443d
|
| 3 |
+
size 1939
|
data/bidders/bidder_a/iso_9001.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fb688f99dbc950337a89336a9c97ea5dd7db0027560c15d3504e07344f28747a
|
| 3 |
+
size 2053
|
data/bidders/bidder_a/project_experience.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ed1b2549c2852a3c5e47737f73692ff0b968bde002ca3271d044a3aee78d201d
|
| 3 |
+
size 2453
|
data/bidders/bidder_b/audited_financials.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:23950d4e983fad85523f7e7ac6e5f7ad4cf44ac16edb6b2f77efea59c52dc02c
|
| 3 |
+
size 2317
|
data/bidders/bidder_b/company_profile.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:23682906de7f68e776b650065e555b0ef6abc425bf519a60fdf20f8b872d25b4
|
| 3 |
+
size 1873
|
data/bidders/bidder_b/gst_certificate.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9cc529d98363ea8dac622a93cbf1dda505f0eb2bbed79cd209738d3412c55ed0
|
| 3 |
+
size 1929
|
data/bidders/bidder_b/iso_9001.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4b06a8a387351c147224d73ae66c177a99ad2b125264dbd29ff313dfade23634
|
| 3 |
+
size 2057
|
data/bidders/bidder_b/project_experience.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:594337e60cc6daafd24e7ecabe733250ec61669130b021c65e615e06af91f435
|
| 3 |
+
size 2372
|
data/bidders/bidder_c/company_profile.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a34e24c98b67e59038ca27657336f2f4521fd577ed42a8f14e17c10e6aad1abf
|
| 3 |
+
size 1862
|
data/bidders/bidder_c/gst_certificate.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bfd82955c4f07b3da1de492fac23239de2d8adcd6e486093a1e3770b652e7985
|
| 3 |
+
size 1942
|
data/bidders/bidder_c/iso_9001.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:262d8d616fc328371af34476101eecea9981490948a2ed8d59cd6443c1753b98
|
| 3 |
+
size 2061
|
data/bidders/bidder_c/project_experience.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:38d5de6e2de322ba97e62ac125b52b0062c553207073c0713a1bcfbd9a481290
|
| 3 |
+
size 2311
|
data/bidders/bidder_c/turnover_certificate_scan.png
ADDED
|
Git LFS Details
|
data/tender/crpf_construction_tender.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6911a05bbfa8f3a341c61b8cebf28e83e99d8274d98717b28c027b5cd0a09032
|
| 3 |
+
size 5829
|
scripts/generate_mock_data.py
CHANGED
|
@@ -1 +1,461 @@
|
|
| 1 |
"""Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""Step 2 — generates mock tender and bidder PDFs + noisy scan PNG."""
|
| 2 |
+
|
| 3 |
+
import io
|
| 4 |
+
import sys
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
import numpy as np
|
| 8 |
+
from PIL import Image, ImageFilter
|
| 9 |
+
from reportlab.lib import colors
|
| 10 |
+
from reportlab.lib.pagesizes import A4
|
| 11 |
+
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
|
| 12 |
+
from reportlab.lib.units import cm
|
| 13 |
+
from reportlab.platypus import (
|
| 14 |
+
Paragraph, SimpleDocTemplate, Spacer, Table, TableStyle, PageBreak
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
BASE_DIR = Path(__file__).resolve().parent.parent
|
| 18 |
+
DATA_DIR = BASE_DIR / "data"
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def _doc(path: Path) -> SimpleDocTemplate:
|
| 22 |
+
return SimpleDocTemplate(str(path), pagesize=A4,
|
| 23 |
+
leftMargin=2*cm, rightMargin=2*cm,
|
| 24 |
+
topMargin=2*cm, bottomMargin=2*cm)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def _styles():
|
| 28 |
+
styles = getSampleStyleSheet()
|
| 29 |
+
styles.add(ParagraphStyle(name="Center", alignment=1, fontSize=12,
|
| 30 |
+
spaceAfter=6))
|
| 31 |
+
styles.add(ParagraphStyle(name="Bold14", fontName="Helvetica-Bold",
|
| 32 |
+
fontSize=14, spaceAfter=8))
|
| 33 |
+
styles.add(ParagraphStyle(name="Bold12", fontName="Helvetica-Bold",
|
| 34 |
+
fontSize=12, spaceAfter=6))
|
| 35 |
+
styles.add(ParagraphStyle(name="Body10", fontSize=10, spaceAfter=4,
|
| 36 |
+
leading=14))
|
| 37 |
+
styles.add(ParagraphStyle(name="Clause", fontSize=10, leftIndent=20,
|
| 38 |
+
spaceAfter=10, leading=14))
|
| 39 |
+
return styles
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def make_tender_pdf(out_path: Path) -> None:
|
| 43 |
+
doc = _doc(out_path)
|
| 44 |
+
s = _styles()
|
| 45 |
+
story = []
|
| 46 |
+
|
| 47 |
+
story.append(Paragraph("GOVERNMENT OF INDIA", s["Center"]))
|
| 48 |
+
story.append(Paragraph("MINISTRY OF HOME AFFAIRS", s["Center"]))
|
| 49 |
+
story.append(Paragraph("CENTRAL RESERVE POLICE FORCE", s["Bold14"]))
|
| 50 |
+
story.append(Paragraph(
|
| 51 |
+
"TENDER DOCUMENT FOR CONSTRUCTION OF RESIDENTIAL QUARTERS",
|
| 52 |
+
s["Center"]
|
| 53 |
+
))
|
| 54 |
+
story.append(Paragraph("Tender No: CRPF/CE/2025-26/RQ/001", s["Center"]))
|
| 55 |
+
story.append(Spacer(1, 0.5*cm))
|
| 56 |
+
|
| 57 |
+
story.append(Paragraph("1. INTRODUCTION", s["Bold12"]))
|
| 58 |
+
story.append(Paragraph(
|
| 59 |
+
"The Central Reserve Police Force (CRPF), under the Ministry of Home Affairs, "
|
| 60 |
+
"Government of India, invites sealed tenders from eligible contractors for the "
|
| 61 |
+
"construction of Residential Quarters at CRPF Camp, New Delhi. The work involves "
|
| 62 |
+
"civil construction, internal electrification, plumbing, and allied works.",
|
| 63 |
+
s["Body10"]
|
| 64 |
+
))
|
| 65 |
+
story.append(Spacer(1, 0.3*cm))
|
| 66 |
+
|
| 67 |
+
story.append(Paragraph("2. SCOPE OF WORK", s["Bold12"]))
|
| 68 |
+
story.append(Paragraph(
|
| 69 |
+
"The scope includes: (a) Construction of Type-III Residential Quarters (G+4) — "
|
| 70 |
+
"24 units; (b) Internal roads, drainage, and compound wall; (c) Water supply and "
|
| 71 |
+
"sanitation infrastructure; (d) Landscaping and external works. Estimated project "
|
| 72 |
+
"value: INR 18 Crore. Completion period: 24 months.",
|
| 73 |
+
s["Body10"]
|
| 74 |
+
))
|
| 75 |
+
story.append(PageBreak())
|
| 76 |
+
|
| 77 |
+
story.append(Paragraph("3. ELIGIBILITY CRITERIA", s["Bold12"]))
|
| 78 |
+
story.append(Paragraph(
|
| 79 |
+
"Only bidders fulfilling ALL mandatory eligibility criteria listed below shall be "
|
| 80 |
+
"considered for technical evaluation. Bids not meeting mandatory criteria shall be "
|
| 81 |
+
"rejected summarily without further evaluation.",
|
| 82 |
+
s["Body10"]
|
| 83 |
+
))
|
| 84 |
+
story.append(Spacer(1, 0.3*cm))
|
| 85 |
+
|
| 86 |
+
story.append(Paragraph("3.2 Mandatory and Desirable Criteria", s["Bold12"]))
|
| 87 |
+
|
| 88 |
+
criteria_text = [
|
| 89 |
+
("3.2(a)", "Financial Capability",
|
| 90 |
+
"The bidder shall have a minimum average annual turnover of INR 5 Crore "
|
| 91 |
+
"(Rupees Five Crore only) during the last three financial years (2022-23, "
|
| 92 |
+
"2023-24, 2024-25), as certified by a Chartered Accountant. Documentary "
|
| 93 |
+
"evidence in the form of audited balance sheets, profit & loss account, and "
|
| 94 |
+
"CA certificate shall be submitted. [MANDATORY]"),
|
| 95 |
+
("3.2(b)", "Technical Experience",
|
| 96 |
+
"The bidder must have successfully completed at least three (3) similar "
|
| 97 |
+
"construction projects of value not less than INR 1 Crore each in the last "
|
| 98 |
+
"five (5) financial years. Completion certificates from clients shall be "
|
| 99 |
+
"submitted along with work orders. [MANDATORY]"),
|
| 100 |
+
("3.2(c)", "GST Registration",
|
| 101 |
+
"The bidder shall possess a valid Goods and Services Tax (GST) registration "
|
| 102 |
+
"certificate. The GSTIN must be active as on the date of submission. A copy "
|
| 103 |
+
"of the GST registration certificate shall be enclosed with the bid. "
|
| 104 |
+
"[MANDATORY]"),
|
| 105 |
+
("3.2(d)", "Quality Certification",
|
| 106 |
+
"The bidder shall hold a valid ISO 9001:2015 Quality Management System "
|
| 107 |
+
"certification issued by an accredited certification body, valid as on the "
|
| 108 |
+
"date of bid submission. Copy of the certificate shall be submitted. "
|
| 109 |
+
"[MANDATORY]"),
|
| 110 |
+
("3.2(e)", "Paramilitary Experience",
|
| 111 |
+
"Preferably, the bidder may have prior experience with construction or "
|
| 112 |
+
"maintenance of paramilitary or defence infrastructure. This is a desirable "
|
| 113 |
+
"criterion and shall not affect mandatory eligibility. Supporting documents "
|
| 114 |
+
"may be submitted for additional credit during evaluation. [DESIRABLE]"),
|
| 115 |
+
]
|
| 116 |
+
|
| 117 |
+
for clause, title, text in criteria_text:
|
| 118 |
+
story.append(Paragraph(f"<b>{clause} {title}</b>", s["Body10"]))
|
| 119 |
+
story.append(Paragraph(text, s["Clause"]))
|
| 120 |
+
|
| 121 |
+
story.append(PageBreak())
|
| 122 |
+
|
| 123 |
+
story.append(Paragraph("4. SUBMISSION PROCEDURE", s["Bold12"]))
|
| 124 |
+
story.append(Paragraph(
|
| 125 |
+
"Bids shall be submitted in two envelopes: Technical Bid and Financial Bid. "
|
| 126 |
+
"Last date of submission: 30-06-2026. Address for submission: The Inspector "
|
| 127 |
+
"General (Works), CRPF Group Centre, New Delhi – 110077. "
|
| 128 |
+
"EMD of INR 36 Lakh (2% of estimated cost) to be deposited via DD/BG.",
|
| 129 |
+
s["Body10"]
|
| 130 |
+
))
|
| 131 |
+
story.append(Spacer(1, 0.5*cm))
|
| 132 |
+
|
| 133 |
+
story.append(Paragraph("5. EVALUATION METHODOLOGY", s["Bold12"]))
|
| 134 |
+
story.append(Paragraph(
|
| 135 |
+
"Evaluation shall proceed in two stages: (i) Technical Evaluation — bidders "
|
| 136 |
+
"meeting all mandatory criteria in 3.2 shall be declared technically qualified; "
|
| 137 |
+
"(ii) Financial Evaluation — lowest L1 bid among technically qualified bidders "
|
| 138 |
+
"shall be recommended. Desirable criteria (3.2(e)) may be used for tie-breaking.",
|
| 139 |
+
s["Body10"]
|
| 140 |
+
))
|
| 141 |
+
story.append(PageBreak())
|
| 142 |
+
|
| 143 |
+
story.append(Paragraph("6. ANNEXURES", s["Bold12"]))
|
| 144 |
+
story.append(Paragraph("Annexure A — Bid Form", s["Body10"]))
|
| 145 |
+
story.append(Paragraph("Annexure B — Declaration of Non-Blacklisting", s["Body10"]))
|
| 146 |
+
story.append(Paragraph("Annexure C — CA Certificate Format (Turnover)", s["Body10"]))
|
| 147 |
+
story.append(Paragraph("Annexure D — Project Completion Certificate Format", s["Body10"]))
|
| 148 |
+
|
| 149 |
+
doc.build(story)
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
def _simple_pdf(out_path: Path, title: str, paragraphs: list[str],
|
| 153 |
+
table_data: list[list] | None = None) -> None:
|
| 154 |
+
doc = _doc(out_path)
|
| 155 |
+
s = _styles()
|
| 156 |
+
story = []
|
| 157 |
+
story.append(Paragraph(title, s["Bold14"]))
|
| 158 |
+
story.append(Spacer(1, 0.3*cm))
|
| 159 |
+
for para in paragraphs:
|
| 160 |
+
story.append(Paragraph(para, s["Body10"]))
|
| 161 |
+
if table_data:
|
| 162 |
+
story.append(Spacer(1, 0.3*cm))
|
| 163 |
+
tbl = Table(table_data, hAlign="LEFT")
|
| 164 |
+
tbl.setStyle(TableStyle([
|
| 165 |
+
("BACKGROUND", (0, 0), (-1, 0), colors.lightgrey),
|
| 166 |
+
("GRID", (0, 0), (-1, -1), 0.5, colors.black),
|
| 167 |
+
("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
|
| 168 |
+
("FONTSIZE", (0, 0), (-1, -1), 9),
|
| 169 |
+
("TOPPADDING", (0, 0), (-1, -1), 4),
|
| 170 |
+
("BOTTOMPADDING", (0, 0), (-1, -1), 4),
|
| 171 |
+
]))
|
| 172 |
+
story.append(tbl)
|
| 173 |
+
doc.build(story)
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
def make_company_profile(out_path: Path, name: str, gstin: str, reg_year: int,
|
| 177 |
+
iso: bool = True, extra_lines: list[str] | None = None) -> None:
|
| 178 |
+
paras = [
|
| 179 |
+
f"<b>Company Name:</b> {name}",
|
| 180 |
+
f"<b>GSTIN:</b> {gstin}",
|
| 181 |
+
f"<b>Year of Registration:</b> {reg_year}",
|
| 182 |
+
f"<b>Nature of Business:</b> Civil Construction and Infrastructure Development",
|
| 183 |
+
f"<b>ISO 9001:2015 Certified:</b> {'Yes' if iso else 'No'}",
|
| 184 |
+
"<b>Registered Office:</b> 42, Industrial Area, Phase II, India",
|
| 185 |
+
]
|
| 186 |
+
if extra_lines:
|
| 187 |
+
paras.extend(extra_lines)
|
| 188 |
+
_simple_pdf(out_path, f"Company Profile — {name}", paras)
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
def make_financials(out_path: Path, company: str,
|
| 192 |
+
rows: list[tuple[str, str, int]], ca_name: str,
|
| 193 |
+
ca_no: str) -> None:
|
| 194 |
+
table_data = [["Financial Year", "Annual Turnover (INR)", "Words"]]
|
| 195 |
+
for fy, words, amount in rows:
|
| 196 |
+
table_data.append([fy, f"{amount:,}", words])
|
| 197 |
+
avg = sum(r[2] for r in rows) // len(rows)
|
| 198 |
+
table_data.append(["Average (3 years)", f"{avg:,}", ""])
|
| 199 |
+
|
| 200 |
+
paras = [
|
| 201 |
+
f"<b>Company:</b> {company}",
|
| 202 |
+
"The following statement of annual turnover has been prepared from the "
|
| 203 |
+
"audited accounts and is certified to be true and correct.",
|
| 204 |
+
]
|
| 205 |
+
paras.append(f"<b>Certified by:</b> {ca_name}, Chartered Accountant, M. No. {ca_no}")
|
| 206 |
+
paras.append("<b>UDIN:</b> 26123456AAAAA0001")
|
| 207 |
+
paras.append("<b>Place:</b> Mumbai <b>Date:</b> 01-04-2026")
|
| 208 |
+
_simple_pdf(out_path,
|
| 209 |
+
"Audited Financial Statement — Annual Turnover Certificate",
|
| 210 |
+
paras, table_data)
|
| 211 |
+
|
| 212 |
+
|
| 213 |
+
def make_project_experience(out_path: Path, company: str,
|
| 214 |
+
projects: list[dict]) -> None:
|
| 215 |
+
table_data = [["#", "Project Name", "Client", "Value (INR Cr)", "Year", "Status"]]
|
| 216 |
+
for i, p in enumerate(projects, 1):
|
| 217 |
+
table_data.append([
|
| 218 |
+
str(i), p["name"], p["client"],
|
| 219 |
+
str(p["value"]), str(p["year"]), p.get("status", "Completed")
|
| 220 |
+
])
|
| 221 |
+
paras = [
|
| 222 |
+
f"<b>Company:</b> {company}",
|
| 223 |
+
"The following construction projects have been completed by the organization "
|
| 224 |
+
"in the last five financial years (2020–2025). Completion certificates are "
|
| 225 |
+
"enclosed separately.",
|
| 226 |
+
]
|
| 227 |
+
_simple_pdf(out_path, "Project Experience Certificate", paras, table_data)
|
| 228 |
+
|
| 229 |
+
|
| 230 |
+
def make_gst_certificate(out_path: Path, gstin: str, company: str,
|
| 231 |
+
valid_through: str) -> None:
|
| 232 |
+
paras = [
|
| 233 |
+
"<b>GOODS AND SERVICES TAX REGISTRATION CERTIFICATE</b>",
|
| 234 |
+
f"<b>Legal Name of Business:</b> {company}",
|
| 235 |
+
f"<b>GSTIN:</b> {gstin}",
|
| 236 |
+
f"<b>Date of Registration:</b> 01-07-2017",
|
| 237 |
+
f"<b>Valid Through:</b> {valid_through}",
|
| 238 |
+
f"<b>Registration Status:</b> ACTIVE",
|
| 239 |
+
f"<b>Type of Registration:</b> Regular",
|
| 240 |
+
f"<b>Issuing Authority:</b> Assistant Commissioner CGST, Mumbai",
|
| 241 |
+
]
|
| 242 |
+
_simple_pdf(out_path, "GST Registration Certificate", paras)
|
| 243 |
+
|
| 244 |
+
|
| 245 |
+
def make_iso_certificate(out_path: Path, cert_no: str, company: str,
|
| 246 |
+
valid_through: str, issuer: str) -> None:
|
| 247 |
+
paras = [
|
| 248 |
+
"<b>ISO 9001:2015 QUALITY MANAGEMENT SYSTEM CERTIFICATE</b>",
|
| 249 |
+
f"<b>Certificate Number:</b> {cert_no}",
|
| 250 |
+
f"<b>This certifies that:</b> {company}",
|
| 251 |
+
"<b>Scope:</b> Design and Construction of Civil Infrastructure including "
|
| 252 |
+
"Residential, Commercial and Industrial Buildings",
|
| 253 |
+
f"<b>Valid Through:</b> {valid_through}",
|
| 254 |
+
f"<b>Issuing Body:</b> {issuer}",
|
| 255 |
+
"<b>Accreditation:</b> National Accreditation Board for Certification Bodies (NABCB)",
|
| 256 |
+
"<b>This certificate is issued in accordance with ISO 9001:2015 standard.</b>",
|
| 257 |
+
]
|
| 258 |
+
_simple_pdf(out_path, "ISO 9001:2015 Certificate", paras)
|
| 259 |
+
|
| 260 |
+
|
| 261 |
+
def _render_ca_cert_to_pil(company: str, gstin: str, avg_amount: int,
|
| 262 |
+
avg_words: str) -> Image.Image:
|
| 263 |
+
"""Render a CA turnover certificate PDF page to PIL image for degradation."""
|
| 264 |
+
import fitz # PyMuPDF
|
| 265 |
+
|
| 266 |
+
buf = io.BytesIO()
|
| 267 |
+
doc = _doc(Path("/tmp/dummy.pdf")) # path unused for in-memory
|
| 268 |
+
s = _styles()
|
| 269 |
+
story = [
|
| 270 |
+
Paragraph("CHARTERED ACCOUNTANT'S CERTIFICATE", s["Bold14"]),
|
| 271 |
+
Paragraph("(As per Annexure C of Tender No: CRPF/CE/2025-26/RQ/001)", s["Body10"]),
|
| 272 |
+
Spacer(1, 0.5*cm),
|
| 273 |
+
Paragraph(
|
| 274 |
+
f"This is to certify that M/s {company} (GSTIN: {gstin}) is a registered "
|
| 275 |
+
"entity engaged in civil construction activities. Based on the audited "
|
| 276 |
+
"financial statements and books of accounts duly certified under applicable "
|
| 277 |
+
"provisions of the Companies Act, 2013:",
|
| 278 |
+
s["Body10"]
|
| 279 |
+
),
|
| 280 |
+
Spacer(1, 0.3*cm),
|
| 281 |
+
Paragraph(
|
| 282 |
+
f"The average annual turnover of the firm for the three financial years "
|
| 283 |
+
f"2022-23, 2023-24, and 2024-25 is <b>INR {avg_amount:,} ({avg_words} only)</b>.",
|
| 284 |
+
s["Body10"]
|
| 285 |
+
),
|
| 286 |
+
Spacer(1, 0.3*cm),
|
| 287 |
+
]
|
| 288 |
+
|
| 289 |
+
table_data = [
|
| 290 |
+
["Financial Year", "Annual Turnover (INR)", "In Words"],
|
| 291 |
+
["2022-23", "4,80,00,000", "Four Crore Eighty Lakh"],
|
| 292 |
+
["2023-24", "5,40,00,000", "Five Crore Forty Lakh"],
|
| 293 |
+
["2024-25", "6,00,00,000", "Six Crore"],
|
| 294 |
+
[f"Average (3 years)", f"{avg_amount:,}", avg_words],
|
| 295 |
+
]
|
| 296 |
+
tbl = Table(table_data)
|
| 297 |
+
tbl.setStyle(TableStyle([
|
| 298 |
+
("BACKGROUND", (0, 0), (-1, 0), colors.lightgrey),
|
| 299 |
+
("GRID", (0, 0), (-1, -1), 0.5, colors.black),
|
| 300 |
+
("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
|
| 301 |
+
("FONTSIZE", (0, 0), (-1, -1), 9),
|
| 302 |
+
("TOPPADDING", (0, 0), (-1, -1), 4),
|
| 303 |
+
("BOTTOMPADDING", (0, 0), (-1, -1), 4),
|
| 304 |
+
]))
|
| 305 |
+
story.append(tbl)
|
| 306 |
+
story.extend([
|
| 307 |
+
Spacer(1, 0.5*cm),
|
| 308 |
+
Paragraph("Certified by:", s["Body10"]),
|
| 309 |
+
Paragraph("<b>CA Vikram Shah</b>", s["Body10"]),
|
| 310 |
+
Paragraph("M. No. 098765", s["Body10"]),
|
| 311 |
+
Paragraph("FRN: 100001W", s["Body10"]),
|
| 312 |
+
Paragraph("Place: Ahmedabad Date: 05-04-2026", s["Body10"]),
|
| 313 |
+
Paragraph("UDIN: 26098765BBBBB0002", s["Body10"]),
|
| 314 |
+
])
|
| 315 |
+
|
| 316 |
+
buf = io.BytesIO()
|
| 317 |
+
pdf_doc = SimpleDocTemplate(buf, pagesize=A4,
|
| 318 |
+
leftMargin=2*cm, rightMargin=2*cm,
|
| 319 |
+
topMargin=2*cm, bottomMargin=2*cm)
|
| 320 |
+
pdf_doc.build(story)
|
| 321 |
+
buf.seek(0)
|
| 322 |
+
|
| 323 |
+
fitz_doc = fitz.open(stream=buf.read(), filetype="pdf")
|
| 324 |
+
page = fitz_doc[0]
|
| 325 |
+
mat = fitz.Matrix(150/72, 150/72)
|
| 326 |
+
pix = page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)
|
| 327 |
+
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
| 328 |
+
fitz_doc.close()
|
| 329 |
+
return img
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
def make_noisy_scan(out_path: Path) -> None:
|
| 333 |
+
img = _render_ca_cert_to_pil(
|
| 334 |
+
company="Shree Constructions & Services",
|
| 335 |
+
gstin="24AABCC9012H1Z1",
|
| 336 |
+
avg_amount=5_40_00_000,
|
| 337 |
+
avg_words="Five Crore Forty Lakh",
|
| 338 |
+
)
|
| 339 |
+
|
| 340 |
+
# Apply degradation
|
| 341 |
+
img = img.filter(ImageFilter.GaussianBlur(radius=1.5))
|
| 342 |
+
|
| 343 |
+
arr = np.array(img, dtype=np.uint8)
|
| 344 |
+
rng = np.random.default_rng(seed=42)
|
| 345 |
+
noise_mask = rng.random(arr.shape[:2])
|
| 346 |
+
arr[noise_mask < 0.025] = 0
|
| 347 |
+
arr[noise_mask > 0.975] = 255
|
| 348 |
+
img = Image.fromarray(arr)
|
| 349 |
+
|
| 350 |
+
img = img.rotate(-2, expand=True, fillcolor=(255, 255, 255))
|
| 351 |
+
|
| 352 |
+
jpeg_buf = io.BytesIO()
|
| 353 |
+
img.convert("RGB").save(jpeg_buf, format="JPEG", quality=40)
|
| 354 |
+
jpeg_buf.seek(0)
|
| 355 |
+
img = Image.open(jpeg_buf).copy()
|
| 356 |
+
|
| 357 |
+
img.save(str(out_path), format="PNG")
|
| 358 |
+
|
| 359 |
+
|
| 360 |
+
def main() -> None:
|
| 361 |
+
# Ensure output dirs exist
|
| 362 |
+
for d in [
|
| 363 |
+
DATA_DIR / "tender",
|
| 364 |
+
DATA_DIR / "bidders" / "bidder_a",
|
| 365 |
+
DATA_DIR / "bidders" / "bidder_b",
|
| 366 |
+
DATA_DIR / "bidders" / "bidder_c",
|
| 367 |
+
DATA_DIR / "precomputed",
|
| 368 |
+
]:
|
| 369 |
+
d.mkdir(parents=True, exist_ok=True)
|
| 370 |
+
|
| 371 |
+
# Tender
|
| 372 |
+
make_tender_pdf(DATA_DIR / "tender" / "crpf_construction_tender.pdf")
|
| 373 |
+
|
| 374 |
+
# ── Bidder A ─────────────────────────────────────────────────────────────
|
| 375 |
+
a = DATA_DIR / "bidders" / "bidder_a"
|
| 376 |
+
make_company_profile(a / "company_profile.pdf",
|
| 377 |
+
"Apex Constructions Pvt. Ltd.", "27AABCA1234F1Z5", 2010)
|
| 378 |
+
make_financials(a / "audited_financials.pdf",
|
| 379 |
+
"Apex Constructions Pvt. Ltd.",
|
| 380 |
+
[
|
| 381 |
+
("2022-23", "Five Crore Eighty Lakh", 5_80_00_000),
|
| 382 |
+
("2023-24", "Six Crore Twenty Lakh", 6_20_00_000),
|
| 383 |
+
("2024-25", "Seven Crore Ten Lakh", 7_10_00_000),
|
| 384 |
+
],
|
| 385 |
+
ca_name="CA Ramesh Kumar", ca_no="123456")
|
| 386 |
+
make_project_experience(a / "project_experience.pdf",
|
| 387 |
+
"Apex Constructions Pvt. Ltd.",
|
| 388 |
+
[
|
| 389 |
+
{"name": "Staff Quarters Block A", "client": "PWD Delhi",
|
| 390 |
+
"value": 2.5, "year": 2021},
|
| 391 |
+
{"name": "Office Complex Phase 1", "client": "CPWD Mumbai",
|
| 392 |
+
"value": 3.2, "year": 2022},
|
| 393 |
+
{"name": "Residential Complex", "client": "NBCC Ltd",
|
| 394 |
+
"value": 4.1, "year": 2023},
|
| 395 |
+
{"name": "Barracks Construction", "client": "CRPF Camp Pune",
|
| 396 |
+
"value": 3.5, "year": 2024},
|
| 397 |
+
{"name": "Commercial Warehouse", "client": "DDA",
|
| 398 |
+
"value": 1.8, "year": 2025},
|
| 399 |
+
])
|
| 400 |
+
make_gst_certificate(a / "gst_certificate.pdf", "27AABCA1234F1Z5",
|
| 401 |
+
"Apex Constructions Pvt. Ltd.", "31-03-2027")
|
| 402 |
+
make_iso_certificate(a / "iso_9001.pdf", "ISO-2021-9001-APEX",
|
| 403 |
+
"Apex Constructions Pvt. Ltd.", "15-06-2027",
|
| 404 |
+
"Bureau Veritas Certification India Pvt. Ltd.")
|
| 405 |
+
|
| 406 |
+
# ── Bidder B ─────────────────────────────────────────────────────────────
|
| 407 |
+
b = DATA_DIR / "bidders" / "bidder_b"
|
| 408 |
+
make_company_profile(b / "company_profile.pdf",
|
| 409 |
+
"BuildRight Enterprises", "29AABCB5678G1Z3", 2015)
|
| 410 |
+
make_financials(b / "audited_financials.pdf",
|
| 411 |
+
"BuildRight Enterprises",
|
| 412 |
+
[
|
| 413 |
+
("2022-23", "One Crore Twenty Lakh", 1_20_00_000),
|
| 414 |
+
("2023-24", "One Crore Fifty Lakh", 1_50_00_000),
|
| 415 |
+
("2024-25", "One Crore Eighty Lakh", 1_80_00_000),
|
| 416 |
+
],
|
| 417 |
+
ca_name="CA Suresh Patel", ca_no="654321")
|
| 418 |
+
make_project_experience(b / "project_experience.pdf",
|
| 419 |
+
"BuildRight Enterprises",
|
| 420 |
+
[
|
| 421 |
+
{"name": "Residential Quarters", "client": "Municipal Corp",
|
| 422 |
+
"value": 1.1, "year": 2022},
|
| 423 |
+
{"name": "School Building Renovation", "client": "KVS",
|
| 424 |
+
"value": 1.3, "year": 2023},
|
| 425 |
+
{"name": "Community Hall", "client": "NDMC",
|
| 426 |
+
"value": 1.2, "year": 2024},
|
| 427 |
+
{"name": "Warehouse Shed", "client": "FCI",
|
| 428 |
+
"value": 1.0, "year": 2025},
|
| 429 |
+
])
|
| 430 |
+
make_gst_certificate(b / "gst_certificate.pdf", "29AABCB5678G1Z3",
|
| 431 |
+
"BuildRight Enterprises", "31-03-2027")
|
| 432 |
+
make_iso_certificate(b / "iso_9001.pdf", "ISO-2022-9001-BR",
|
| 433 |
+
"BuildRight Enterprises", "20-08-2027",
|
| 434 |
+
"TUV SUD South Asia Pvt. Ltd.")
|
| 435 |
+
|
| 436 |
+
# ── Bidder C ─────────────────────────────────────────────────────────────
|
| 437 |
+
c = DATA_DIR / "bidders" / "bidder_c"
|
| 438 |
+
make_company_profile(c / "company_profile.pdf",
|
| 439 |
+
"Shree Constructions & Services", "24AABCC9012H1Z1", 2012)
|
| 440 |
+
make_project_experience(c / "project_experience.pdf",
|
| 441 |
+
"Shree Constructions & Services",
|
| 442 |
+
[
|
| 443 |
+
{"name": "Housing Complex Phase 1", "client": "GIDC",
|
| 444 |
+
"value": 1.2, "year": 2023},
|
| 445 |
+
{"name": "Commercial Plaza", "client": "Ahmedabad MC",
|
| 446 |
+
"value": 2.1, "year": 2024},
|
| 447 |
+
{"name": "Road & Drainage Works", "client": "NHAI",
|
| 448 |
+
"value": 1.5, "year": 2025},
|
| 449 |
+
])
|
| 450 |
+
make_gst_certificate(c / "gst_certificate.pdf", "24AABCC9012H1Z1",
|
| 451 |
+
"Shree Constructions & Services", "31-03-2027")
|
| 452 |
+
make_iso_certificate(c / "iso_9001.pdf", "ISO-2023-9001-SCS",
|
| 453 |
+
"Shree Constructions & Services", "10-09-2027",
|
| 454 |
+
"DNV Business Assurance India Pvt. Ltd.")
|
| 455 |
+
make_noisy_scan(c / "turnover_certificate_scan.png")
|
| 456 |
+
|
| 457 |
+
print("Mock data generated successfully.")
|
| 458 |
+
|
| 459 |
+
|
| 460 |
+
if __name__ == "__main__":
|
| 461 |
+
main()
|
specs/11_mock_data.md
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Spec 11 — Mock Data Generation
|
| 2 |
+
|
| 3 |
+
**Step:** 2 of 15
|
| 4 |
+
**Time budget:** ~25 min
|
| 5 |
+
**Checkpoint:** `data/` directory populated; `turnover_certificate_scan.png` is a visibly noisy scan that Tesseract reads with low confidence (~50–65%).
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Goal
|
| 10 |
+
|
| 11 |
+
`scripts/generate_mock_data.py` is a single deterministic script that produces:
|
| 12 |
+
1. One tender PDF (`data/tender/crpf_construction_tender.pdf`)
|
| 13 |
+
2. Five PDFs for Bidder A (clearly eligible)
|
| 14 |
+
3. Five PDFs for Bidder B (clearly ineligible — turnover too low)
|
| 15 |
+
4. Four PDFs + one noisy scan PNG for Bidder C (needs review)
|
| 16 |
+
|
| 17 |
+
All files are entirely synthetic and self-contained — no external assets required. The script must run in under 30 seconds.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## Dependencies
|
| 22 |
+
|
| 23 |
+
- `reportlab` — PDF generation
|
| 24 |
+
- `Pillow` — image manipulation
|
| 25 |
+
- `numpy` — salt-and-pepper noise
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## Output Files
|
| 30 |
+
|
| 31 |
+
```
|
| 32 |
+
data/
|
| 33 |
+
tender/
|
| 34 |
+
crpf_construction_tender.pdf
|
| 35 |
+
bidders/
|
| 36 |
+
bidder_a/
|
| 37 |
+
company_profile.pdf
|
| 38 |
+
audited_financials.pdf
|
| 39 |
+
project_experience.pdf
|
| 40 |
+
gst_certificate.pdf
|
| 41 |
+
iso_9001.pdf
|
| 42 |
+
bidder_b/
|
| 43 |
+
company_profile.pdf
|
| 44 |
+
audited_financials.pdf
|
| 45 |
+
project_experience.pdf
|
| 46 |
+
gst_certificate.pdf
|
| 47 |
+
iso_9001.pdf
|
| 48 |
+
bidder_c/
|
| 49 |
+
company_profile.pdf
|
| 50 |
+
project_experience.pdf
|
| 51 |
+
gst_certificate.pdf
|
| 52 |
+
iso_9001.pdf
|
| 53 |
+
turnover_certificate_scan.png
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## Tender PDF — `crpf_construction_tender.pdf`
|
| 59 |
+
|
| 60 |
+
`reportlab` SimpleDocTemplate, 5–6 pages with formal government tender language.
|
| 61 |
+
|
| 62 |
+
### Sections
|
| 63 |
+
|
| 64 |
+
1. **Introduction** — "Central Reserve Police Force, Ministry of Home Affairs, Government of India. Tender for Construction of Residential Quarters."
|
| 65 |
+
2. **Scope of Work** — brief description of construction project.
|
| 66 |
+
3. **Eligibility Criteria** — Section 3.2, contains five criteria (see table below).
|
| 67 |
+
4. **Submission Procedure** — dates, contact details.
|
| 68 |
+
5. **Evaluation Methodology** — how bids will be scored.
|
| 69 |
+
6. **Annexures** — supporting forms.
|
| 70 |
+
|
| 71 |
+
### Five Criteria (exact text in Section 3.2)
|
| 72 |
+
|
| 73 |
+
| ID | Clause | Verbatim Text | Mandatory | Category |
|
| 74 |
+
|---|---|---|---|---|
|
| 75 |
+
| C1 | 3.2(a) | "The bidder shall have a minimum average annual turnover of INR 5 Crore (Rupees Five Crore only) during the last three financial years (2022-23, 2023-24, 2024-25), as certified by a Chartered Accountant." | Yes | financial |
|
| 76 |
+
| C2 | 3.2(b) | "The bidder must have successfully completed at least three (3) similar construction projects of value not less than INR 1 Crore each in the last five (5) financial years. Completion certificates from clients shall be submitted." | Yes | technical |
|
| 77 |
+
| C3 | 3.2(c) | "The bidder shall possess a valid Goods and Services Tax (GST) registration certificate. The GSTIN must be active as on the date of submission." | Yes | compliance |
|
| 78 |
+
| C4 | 3.2(d) | "The bidder shall hold a valid ISO 9001:2015 Quality Management System certification issued by an accredited certification body, valid as on the date of bid submission." | Yes | compliance |
|
| 79 |
+
| C5 | 3.2(e) | "Preferably, the bidder may have prior experience with construction or maintenance of paramilitary or defence infrastructure. This is a desirable criterion and shall not affect mandatory eligibility." | No | technical |
|
| 80 |
+
|
| 81 |
+
C5 uses "preferably" and "desirable" → tests the mandatory-vs-optional classifier.
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## Bidder A — Clearly Eligible
|
| 86 |
+
|
| 87 |
+
### `company_profile.pdf`
|
| 88 |
+
- Company: "Apex Constructions Pvt. Ltd."
|
| 89 |
+
- GSTIN: 27AABCA1234F1Z5
|
| 90 |
+
- Registered: 2010
|
| 91 |
+
- ISO 9001:2015 certified: Yes
|
| 92 |
+
|
| 93 |
+
### `audited_financials.pdf`
|
| 94 |
+
- FY 2022-23: Annual Turnover INR 5,80,00,000 (Rupees Five Crore Eighty Lakh)
|
| 95 |
+
- FY 2023-24: Annual Turnover INR 6,20,00,000 (Rupees Six Crore Twenty Lakh)
|
| 96 |
+
- FY 2024-25: Annual Turnover INR 7,10,00,000 (Rupees Seven Crore Ten Lakh)
|
| 97 |
+
- Average: INR 6,36,66,667 — exceeds INR 5 Crore threshold
|
| 98 |
+
- Certified by: CA Ramesh Kumar, M. No. 123456
|
| 99 |
+
|
| 100 |
+
### `project_experience.pdf`
|
| 101 |
+
- 5 projects listed (2020–2025), each ≥ INR 1 Crore
|
| 102 |
+
- Includes one CRPF project (2023): "Construction of barracks, CRPF Camp, Pune, INR 3.5 Crore"
|
| 103 |
+
|
| 104 |
+
### `gst_certificate.pdf`
|
| 105 |
+
- GSTIN: 27AABCA1234F1Z5
|
| 106 |
+
- Valid through: 31-03-2027
|
| 107 |
+
- Status: Active
|
| 108 |
+
|
| 109 |
+
### `iso_9001.pdf`
|
| 110 |
+
- Certificate No: ISO-2021-9001-APEX
|
| 111 |
+
- Valid through: 15-06-2027
|
| 112 |
+
- Issued by: Bureau Veritas
|
| 113 |
+
|
| 114 |
+
---
|
| 115 |
+
|
| 116 |
+
## Bidder B — Clearly Ineligible (turnover too low)
|
| 117 |
+
|
| 118 |
+
Same structure as Bidder A, but financials are below threshold.
|
| 119 |
+
|
| 120 |
+
### `company_profile.pdf`
|
| 121 |
+
- Company: "BuildRight Enterprises"
|
| 122 |
+
- GSTIN: 29AABCB5678G1Z3
|
| 123 |
+
|
| 124 |
+
### `audited_financials.pdf`
|
| 125 |
+
- FY 2022-23: Annual Turnover INR 1,20,00,000 (Rupees One Crore Twenty Lakh)
|
| 126 |
+
- FY 2023-24: Annual Turnover INR 1,50,00,000 (Rupees One Crore Fifty Lakh)
|
| 127 |
+
- FY 2024-25: Annual Turnover INR 1,80,00,000 (Rupees One Crore Eighty Lakh)
|
| 128 |
+
- Average: INR 1,50,00,000 — **below** INR 5 Crore threshold
|
| 129 |
+
- Certified by: CA Suresh Patel, M. No. 654321
|
| 130 |
+
|
| 131 |
+
### `project_experience.pdf`
|
| 132 |
+
- 4 projects listed (2021–2025), each ≥ INR 1 Crore — passes C2
|
| 133 |
+
|
| 134 |
+
### `gst_certificate.pdf`
|
| 135 |
+
- GSTIN: 29AABCB5678G1Z3, valid through 2027, Active
|
| 136 |
+
|
| 137 |
+
### `iso_9001.pdf`
|
| 138 |
+
- Certificate No: ISO-2022-9001-BR
|
| 139 |
+
- Valid through: 20-08-2027
|
| 140 |
+
|
| 141 |
+
---
|
| 142 |
+
|
| 143 |
+
## Bidder C — Needs Review (scanned turnover certificate)
|
| 144 |
+
|
| 145 |
+
No typed `audited_financials.pdf`. Instead: a deliberately noisy scan PNG.
|
| 146 |
+
|
| 147 |
+
### `company_profile.pdf`
|
| 148 |
+
- Company: "Shree Constructions & Services"
|
| 149 |
+
- GSTIN: 24AABCC9012H1Z1
|
| 150 |
+
|
| 151 |
+
### `project_experience.pdf`
|
| 152 |
+
- Exactly 3 projects (borderline meets count threshold for C2)
|
| 153 |
+
- Values: INR 1.2 Cr, INR 1.5 Cr, INR 2.1 Cr
|
| 154 |
+
|
| 155 |
+
### `gst_certificate.pdf`
|
| 156 |
+
- GSTIN: 24AABCC9012H1Z1, valid through 2027, Active
|
| 157 |
+
|
| 158 |
+
### `iso_9001.pdf`
|
| 159 |
+
- Certificate No: ISO-2023-9001-SCS
|
| 160 |
+
- Valid through: 10-09-2027
|
| 161 |
+
|
| 162 |
+
### `turnover_certificate_scan.png` — noisy scan generation
|
| 163 |
+
|
| 164 |
+
This is the OCR demo centerpiece. Steps:
|
| 165 |
+
|
| 166 |
+
1. Render a `reportlab` page to an in-memory PDF with a CA's turnover certificate:
|
| 167 |
+
- "This is to certify that M/s Shree Constructions & Services ... average annual turnover of INR 5,40,00,000 (Rupees Five Crore Forty Lakh only) for the financial years 2022-23, 2023-24, and 2024-25."
|
| 168 |
+
- Include year-wise breakdown table.
|
| 169 |
+
2. Convert that PDF page to a PIL Image at 150 DPI using `fitz` (PyMuPDF).
|
| 170 |
+
3. Apply degradation:
|
| 171 |
+
- `ImageFilter.GaussianBlur(radius=1.5)`
|
| 172 |
+
- Salt-and-pepper noise via numpy: randomly set ~5% of pixels to 0 or 255
|
| 173 |
+
- `image.rotate(-2, expand=True, fillcolor=(255,255,255))`
|
| 174 |
+
- Re-save with JPEG compression at quality=40 then reload as PNG
|
| 175 |
+
4. Save as `data/bidders/bidder_c/turnover_certificate_scan.png`
|
| 176 |
+
|
| 177 |
+
**Expected outcome:** Tesseract reads this at mean confidence ~50–65% → triggers Tier-3 vision LLM. The turnover figure (INR 5,40,00,000) is present but partially degraded, making it a realistic "needs human review" case given combined-confidence rules.
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
## Script Design
|
| 182 |
+
|
| 183 |
+
```python
|
| 184 |
+
# scripts/generate_mock_data.py
|
| 185 |
+
|
| 186 |
+
def make_tender_pdf(out_path: Path) -> None: ...
|
| 187 |
+
def make_company_profile(out_path: Path, name: str, gstin: str, year: int) -> None: ...
|
| 188 |
+
def make_financials(out_path: Path, rows: list[tuple[str, str, int]]) -> None: ...
|
| 189 |
+
def make_project_experience(out_path: Path, projects: list[dict]) -> None: ...
|
| 190 |
+
def make_gst_certificate(out_path: Path, gstin: str, valid_through: str) -> None: ...
|
| 191 |
+
def make_iso_certificate(out_path: Path, cert_no: str, valid_through: str, company: str) -> None: ...
|
| 192 |
+
def make_noisy_scan(out_path: Path) -> None: ...
|
| 193 |
+
|
| 194 |
+
if __name__ == "__main__":
|
| 195 |
+
# Ensure output dirs exist
|
| 196 |
+
# Generate all files
|
| 197 |
+
print("Mock data generated successfully.")
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
Each helper creates one PDF/PNG. The script is idempotent (re-running overwrites files). No command-line arguments needed.
|
| 201 |
+
|
| 202 |
+
---
|
| 203 |
+
|
| 204 |
+
## Acceptance Criteria
|
| 205 |
+
|
| 206 |
+
1. Running `python scripts/generate_mock_data.py` exits 0 and prints "Mock data generated successfully."
|
| 207 |
+
2. All 16 files listed above exist after the run.
|
| 208 |
+
3. Each PDF opens in a viewer without errors and contains the text described.
|
| 209 |
+
4. `turnover_certificate_scan.png` is visibly degraded (blurry, rotated, noisy).
|
| 210 |
+
5. Running `pytesseract.image_to_data(Image.open("data/bidders/bidder_c/turnover_certificate_scan.png"))` returns a dataframe where the filtered mean confidence is between 30 and 70 (i.e., low enough to trigger Tier 3).
|
| 211 |
+
6. Script completes in under 30 seconds on any modern machine.
|