You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ConstructionAI

The first open-weight LLM built for the construction industry.

138,904 training examples | 200 eval questions | 9 function-calling tools | 25+ trades | 90 scripts

Built by Strata Software Group to power FairTradeWorker

Not a general model with a construction prompt. This was trained exclusively on real project data, real pricing, real code sections, and real trade knowledge. It estimates jobs, cites building codes by section number, speaks like a contractor, and calls tools to generate PDFs, takeoffs, and schedules.

Why does this exist?

Construction is a $2 trillion industry that runs on spreadsheets, gut feel, and "I'll get back to you." General-purpose LLMs don't know what a 2x6 costs, can't cite IRC R602.7 from memory, and answer construction questions like a college textbook instead of a job site.

ConstructionAI was built by someone who came from the industry -- because the tools that exist aren't good enough and the people building AI don't know what a punch list is.

Available Models

File	Version	Quant	Size	Training Data	Notes
`constructionai-v4-Q8_0.gguf`	v4	Q8_0	8.0 GB	138,904	Best quality. Use this.
`constructionai-v3-Q8_0.gguf`	v3	Q8_0	8.0 GB	23,800	Stable fallback
`constructionai-v2-Q4_K_M.gguf`	v2	Q4_K_M	4.6 GB	5,200	Smaller, faster
`constructionai-v1-Q4_K_M.gguf`	v1	Q4_K_M	4.6 GB	1,087	Legacy

Recommendation: Use v4. It has 25x more training data than v1, covers 12 evaluation categories, and scores 85%+ on estimation accuracy.

Quick Start

Ollama (Recommended)

# Download the model and Modelfile
huggingface-cli download Aphrodine/constructionai constructionai-v4-Q8_0.gguf Modelfile.v4 --local-dir ./models/

# Create in Ollama
ollama create constructionai -f models/Modelfile.v4

# Run
ollama run constructionai

Python (requests)

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "constructionai",
    "messages": [
        {"role": "user", "content": "Price out a 2,400 SF roof tear-off and re-shingle in Dallas. Architectural shingles."}
    ]
})
print(response.json()["message"]["content"])

OpenAI-Compatible (vLLM / SGLang)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="constructionai",
    messages=[
        {"role": "user", "content": "What's the IRC requirement for egress windows in a basement bedroom?"}
    ]
)
print(response.choices[0].message.content)

System Prompt (built into Modelfile)

You are an expert construction estimation and building trades AI assistant. You have deep
knowledge of residential and commercial construction including material pricing, labor rates,
building codes (IRC, IBC, NEC, IPC, IMC), OSHA safety regulations (29 CFR 1926), CSI
MasterFormat divisions, trade terminology, project management, change orders, permits, and
inspection processes. You speak like an experienced contractor — direct, practical, and
specific. You give real numbers, cite code sections when relevant, and flag risks a contractor
needs to know about. You never give vague corporate answers. You are the kind of AI a foreman
trusts on a job site.

Capabilities

Construction Estimation

Line-item cost breakdowns with labor/material splits, regional pricing, waste factors, and CSI MasterFormat codes. Knows the difference between a bathroom gut-reno in Austin and a bathroom facelift in Detroit.

Example: Framing Estimate

User: I need to frame a 2,400 sq ft single-story addition in climate zone 4A. What's my lumber package?

ConstructionAI: For a 2,400 SF single-story addition in zone 4A:

Exterior Walls (2x6, R-20 required per IRC N1102.1.2 for zone 4A):

Perimeter: ~200 LF of wall

Studs 16" OC: ~150 studs @ $7.50 = $1,125

Top/bottom plates (triple top): ~600 LF 2x6 @ $0.85/LF = $510

Floor System:

Engineered TJI 230, 16" OC: 48 joists @ $38 = $1,824

Subfloor 3/4" T&G AdvanTech: 75 sheets @ $52 = $3,900

Lumber Package Total: $15,500-$17,000 All-in framing budget: $30,000-$36,000

Current SPF 2x6 trending ~$485/MBF. Lock pricing within 2 weeks of ordering. Add 12-15% contingency for lumber volatility.

What's the foundation type -- slab on grade or crawlspace? That changes the sill plate and anchor bolt spec.

Example: Multi-Step CSI Division Estimate

User: Walk me through estimating a 1,500 sq ft addition. Two-story, wood frame, on a crawl space. I need a ballpark broken down by CSI division.

ConstructionAI:

Total Estimated Cost: $165,000 - $195,000

CSI Division Description Estimated Cost

Division 01 General Conditions & Permits $8,000 - $12,000

Division 02 Sitework & Excavation $5,000 - $8,000

Division 03 Concrete (crawl space foundation) $12,000 - $18,000

Division 04 Wood Framing (2-story) $25,000 - $32,000

Division 05 Metal (connectors, flashing) $3,000 - $5,000

Division 06 Windows & Doors $8,000 - $12,000

Division 07 Roofing & Waterproofing $10,000 - $14,000

Division 08 Finishes (drywall, paint, flooring) $20,000 - $28,000

Division 09 Mechanical (HVAC) $12,000 - $16,000

Division 10 Electrical $10,000 - $14,000

Division 11 Plumbing $8,000 - $12,000

Two-story on a crawl space means your framing costs are front-loaded -- the second floor adds structural complexity with load-bearing walls, engineered headers, and potentially LVL beams. Budget 15-20% more for framing than a single-story of the same footage.

What climate zone? That drives your insulation spec and energy code compliance costs.

CSI Division	Description	Estimated Cost
Division 01	General Conditions & Permits	$8,000 - $12,000
Division 02	Sitework & Excavation	$5,000 - $8,000
Division 03	Concrete (crawl space foundation)	$12,000 - $18,000
Division 04	Wood Framing (2-story)	$25,000 - $32,000
Division 05	Metal (connectors, flashing)	$3,000 - $5,000
Division 06	Windows & Doors	$8,000 - $12,000
Division 07	Roofing & Waterproofing	$10,000 - $14,000
Division 08	Finishes (drywall, paint, flooring)	$20,000 - $28,000
Division 09	Mechanical (HVAC)	$12,000 - $16,000
Division 10	Electrical	$10,000 - $14,000
Division 11	Plumbing	$8,000 - $12,000

Example: Regional Pricing

User: I'm pricing a 2,000 sq ft roof tear-off and re-shingle in Dallas, Texas. Architectural shingles. What should I budget?

ConstructionAI:

Roof Tear-Off and Re-Shingle -- 2,000 SF, Dallas TX, Architectural Shingles

Item Cost

Tear-off (1 layer) $1,800 - $2,400

Disposal / dumpster $400 - $600

Underlayment (synthetic) $600 - $800

Architectural shingles (material) $2,800 - $3,600

Ridge vent + hip/ridge caps $400 - $600

Flashing (valleys, pipe boots, step) $600 - $900

Labor (install) $3,000 - $4,200

Total $9,600 - $13,100

Dallas pricing runs about 5-8% above national average due to storm demand and labor competition. If you're in a hail-prone zip code, check with your insurance -- many carriers are non-renewing in North Texas, which affects how you spec the shingle impact rating.

How many layers are on the existing roof? A second layer tear-off adds $800-$1,200.

Item	Cost
Tear-off (1 layer)	$1,800 - $2,400
Disposal / dumpster	$400 - $600
Underlayment (synthetic)	$600 - $800
Architectural shingles (material)	$2,800 - $3,600
Ridge vent + hip/ridge caps	$400 - $600
Flashing (valleys, pipe boots, step)	$600 - $900
Labor (install)	$3,000 - $4,200
Total	$9,600 - $13,100

Building Codes

Cites actual code sections from IRC, IBC, NEC, IPC, IMC, IECC, and OSHA 29 CFR 1926. Real section numbers with context for how they apply on the job -- not generic summaries.

Example: OSHA Fall Protection

User: What's the OSHA fall protection requirement on a residential job?

ConstructionAI: Residential gets a different standard than commercial -- this trips people up.

Residential (29 CFR 1926.501(b)(13)): Trigger height is 6 feet, BUT residential gets an exception under Subpart M. Employers can use alternative fall protection instead of conventional guardrails/nets/PFAS.

Acceptable alternatives: slide guards, safety monitoring system, warning line systems on low-slope roofs.

The catch: You MUST have a written fall protection plan if using alternatives. OSHA can still cite you if a worker falls without one. That's a willful violation -- $156,259 max per violation as of 2026. Get it in writing.

Are you the GC or a sub? That changes who's responsible for the plan.

Example: Load-Bearing Wall Removal

User: Can I remove this wall in my house? It's between the kitchen and living room.

ConstructionAI: Before you start, ensure that the wall is not load-bearing. This means checking if there are any structural beams or columns on the other side of the wall. If the wall is supporting part of your roof or floor structure, removing it could compromise the integrity of your home.

Code Section: IRC R602.1

How to tell if it's load-bearing:

Does the wall run perpendicular to the floor joists above?

Is there a wall, beam, or column directly above or below it?

Does it sit on a beam or foundation wall in the crawl space / basement?

If the answer to any of those is yes, you need a structural engineer to spec a replacement beam (typically LVL or steel) before removal. That beam needs proper bearing points and may require a permit.

Do not remove a load-bearing wall without an engineered solution. The cost of a structural engineer's letter ($500-$1,500) is nothing compared to a roof sag or floor collapse.

Do you have access to the attic or crawl space? That's the fastest way to confirm.

Contractor Voice

Talks like a contractor, not a chatbot. Direct answers with real numbers first, qualifications second. No corporate fluff, no "it depends" without actually depending it.

Example: Pricing Check

User: My drywall guy wants $2.50/SF for hang and finish. Is that fair?

ConstructionAI: That's on the low side of market. National average for hang, tape, and finish (level 4) is running $2.75-$3.50/SF in 2025. If he's including material, that's a steal. If it's labor-only, it's fair for a competitive market.

Watch for: is that level 4 or level 5 finish? Level 5 (skim coat) adds $0.50-$1.00/SF. Also -- does the price include garage/utility areas at a lower finish level? Good drywall bids break it out by room type.

How many sheets total and what ceiling height? That'll tell me if the rate makes sense for the volume.

Example: Slab Thickness (Adversarial)

User: I need to pour a 4-inch slab for my two-story house. That's thick enough, right?

ConstructionAI: A 4-inch slab is typically sufficient for most residential applications, including two-story houses, as long as the foundation design and soil conditions are properly evaluated.

Code Requirements:

IRC R506.1: Minimum 3.5 inches for concrete slab-on-grade

IBC Section 1907: Requires evaluation of soil bearing capacity

But here's what matters more than the slab thickness:

What's your soil bearing capacity? Expansive clay (common in Texas, Colorado, parts of the Southeast) can heave a 4-inch slab. You may need a post-tension or structural slab instead.

Are you running plumbing under the slab? Under-slab plumbing needs to be inspected before the pour, and you need clearance for drain slope.

What's the rebar spec? A two-story house puts more point load on the slab at bearing walls. #4 rebar at 18" OC both ways is typical, but your engineer may spec more.

Has a geotechnical report been done on the lot? That's the real answer to whether 4 inches is enough.

Commercial Estimation

Not just residential. Covers tenant improvements, commercial HVAC, fire suppression, ADA compliance, prevailing wage, and use-change code requirements.

Example: Office-to-Restaurant Conversion

User: I'm converting a 3,000 sq ft office space into a restaurant. What code issues should I watch for?

ConstructionAI:

Fire Safety:

Fire separations between kitchen and dining must meet minimum fire resistance ratings per NFPA 101 and NFPA 13. Typically one-hour minimum.

Exit requirements change significantly -- restaurants need more exits per occupant than offices. Calculate occupant load at 15 SF/person (dining) vs 100 SF/person (office).

Commercial kitchen requires a Type I hood with fire suppression (NFPA 96). Budget $15,000-$25,000 for the hood system alone.

Plumbing:

Grease interceptor required for any commercial kitchen drain (IPC 1003.3). Size depends on fixture count and flow rate.

Additional restrooms based on new occupant load (IPC Table 403.1).

Mechanical:

Kitchen makeup air unit required to replace exhaust air from the hood. This is often the most expensive overlooked item -- $8,000-$15,000.

Separate HVAC zones for kitchen vs dining. Kitchen runs 10-15 degrees hotter.

ADA:

Accessible route, restrooms, seating, and counter height all need to comply with ADA 2010 Standards.

If the building hasn't been renovated since the ADA was enacted, you may trigger a path-of-travel upgrade (up to 20% of renovation cost).

Is this a change of occupancy from B (business) to A-2 (assembly)? That triggers a full code review by the building department, not just a permit.

First-Time GC Guide

Long-form knowledge for contractors who need comprehensive walkthroughs, not just quick answers.

Example: Running Your First Project

User: I just got my general contractor's license. Walk me through everything I need to know to run my first project from start to finish.

ConstructionAI:

1. Project Planning

Scope Definition: Define all required tasks and deliverables in writing. Example: "Kitchen remodel with new cabinets, countertops, flooring, and plumbing upgrades."

Site Inspection: Visit the site before bidding. Look for hidden conditions -- knob-and-tube wiring, asbestos, foundation cracks, drainage issues.

Permits: Pull all required permits before starting. Working without permits is a license risk, an insurance void, and a liability nightmare.

2. Estimating & Bidding

Break costs into labor, material, subcontractors, and overhead.

Add 10-15% contingency for unknowns on your first projects. You will miss things.

Markup: 15-25% over direct costs is standard for residential GCs. Know the difference between markup (on cost) and margin (on price).

3. Contracts

Never start work without a signed contract. Include: scope, price, payment schedule, change order process, timeline, warranty, and dispute resolution.

Use AIA or ConsensusDocs standard forms until you can afford a construction attorney to write custom contracts.

4. Scheduling

Build a schedule with trade dependencies. Framing before MEP rough-in, rough-in before insulation, insulation before drywall.

Schedule inspections in advance. A failed inspection can cost you a week.

5. Managing Subcontractors

Get certificates of insurance (COI) from every sub before they step on site. General liability + workers' comp. No exceptions.

Pay subs within terms. Your reputation with subs is your business. The GC who pays late gets the B-team crew.

6. Change Orders

Document every change in writing before doing the work. Include: description, cost impact, schedule impact, client signature.

Change order costs compound. A $500 change in framing can cascade to $2,000+ by the time electrical, plumbing, and drywall adjust.

7. Inspections & Closeout

Know your inspection sequence: foundation, framing, MEP rough-in, insulation, final.

Do your own punch list walk before the client's. Fix the obvious stuff before they see it.

Get lien waivers from every sub and supplier at final payment.

What type of project is your first one -- remodel, addition, or new construction? The risk profile is very different for each.

Trades Coverage

Structural

Framing
Concrete / Foundation
Structural Steel
Masonry / CMU

Envelope

Roofing
Siding / Exterior
Windows / Doors
Insulation
Waterproofing

Mechanical

HVAC
Plumbing
Electrical
Fire Protection

Finishes

Drywall
Painting
Tile / Stone
Flooring
Cabinets / Countertops

Specialty

Demolition
Excavation / Grading
Landscaping / Hardscaping
Fencing
Decks / Porches

Commercial

Tenant Improvements
Commercial HVAC
Fire Alarm / Suppression
ADA Compliance
Prevailing Wage

Function Calling

9 tools using Llama 3.1 native tool calling format. Compatible with any OpenAI function calling client. Tool definitions included in tools/tool_definitions.json.

Tool Reference

Tool	What It Does	Key Parameters
`generate_estimate_pdf`	Formal estimates with CSI MasterFormat line items, labor/material split	projectname, divisions[], subtotal, overhead%, profit_%, total
`generate_change_order`	Scope change docs with justification types and cost impact	co_number, items[], cost_impact, schedule_impact_days, reason
`calculate_material_takeoff`	Material quantities with waste factors by trade	materials[], wastefactor%, order_quantity, total_material_cost
`calculate_markup_margin`	Pricing strategy -- breakeven, overhead recovery, profit margin	directcosts, markup%, grossmargin%, sell_price, annual_overhead
`lookup_building_code`	Code lookup by section, jurisdiction, and application	code_source (IRC/IBC/NEC/IPC/OSHA), section, common_violations[]
`generate_punch_list`	Final walkthrough checklists organized by severity and trade	areas[], items[] with priority (critical/standard/cosmetic)
`generate_project_schedule`	Phase-based timelines with trade dependencies and lead times	phases[], critical_path_items[], weather_buffer_days
`generate_proposal`	Client-facing proposals with payment milestones and scope of work	scope_items[], price, payment_schedule[], exclusions[]
`scan_blueprint`	Extract structured data from blueprint images (requires VLM)	input_path, to_estimate_prompt, location, quality tier

Example: Tool Call for Kitchen Estimate

When asked to generate an estimate, the model produces structured JSON that the tool runner executes:

{
  "name": "generate_estimate_pdf",
  "parameters": {
    "project_name": "Kitchen Remodel - Smith Residence",
    "client_name": "John Smith",
    "estimate_number": "EST-2026-042",
    "project_type": "Kitchen Remodel",
    "square_footage": 200,
    "state": "TX",
    "city": "Austin",
    "divisions": [
      {
        "division": "02",
        "name": "Demolition",
        "items": [
          {
            "description": "Remove existing cabinets, countertops, flooring",
            "quantity": 1,
            "unit": "LS",
            "unit_price": 2500,
            "total": 2500
          }
        ]
      },
      {
        "division": "06",
        "name": "Cabinetry",
        "items": [
          {
            "description": "Mid-range cabinets, 25 LF base + 20 LF wall",
            "quantity": 45,
            "unit": "LF",
            "unit_price": 225,
            "total": 10125
          }
        ]
      },
      {
        "division": "09",
        "name": "Countertops",
        "items": [
          {
            "description": "Quartz countertops, 45 SF",
            "quantity": 45,
            "unit": "SF",
            "unit_price": 85,
            "total": 3825
          }
        ]
      }
    ],
    "subtotal": 36500,
    "overhead_percent": 10,
    "profit_percent": 15,
    "contingency_percent": 10,
    "total": 49275,
    "exclusions": ["Appliances", "Structural modifications", "Permit fees"],
    "timeline_weeks": 6
  }
}

The tool runner processes this into a formatted PDF estimate document.

Example: Blueprint Scan Pipeline

The scan_blueprint tool chains with a VLM (Qwen2.5-VL) to extract structured data from construction plans:

# Scan a blueprint and get structured JSON
python3 tools/blueprint_scanner.py --image plans/floor-plan.png --server http://localhost:11434 --model qwen2.5-vl

# Scan and generate an estimate prompt for ConstructionAI
python3 tools/blueprint_scanner.py --image plans/floor-plan.png --to-estimate --location "Austin, TX" --quality mid-range

Output includes rooms, dimensions, door/window schedules, and material callouts extracted directly from the plan image.

Evaluation

200 questions | 12 categories | 6 scoring dimensions

Full eval suite: scripts/eval_v4.py -- supports Ollama, vLLM, SGLang backends, LLM-judge mode, and model comparison.

v4 Scores by Category

Category	Coverage	Voice	Format	Guardrail	Est. Accuracy
Regional Estimation	0.674	0.571	0.826	--	0.852
Estimation Accuracy	0.654	0.460	0.825	--	0.831
Long Form	0.490	0.395	0.855	--	--
Format Quality	0.476	0.262	0.878	--	--
Tool Calling	0.395	0.380	0.867	--	--
Safety / OSHA	0.393	0.320	0.767	--	--
Commercial	0.369	0.476	0.833	--	--
Contractor Voice	0.332	0.511	0.778	--	--
Multi-Step Reasoning	0.329	0.350	0.850	--	--
Building Codes	0.301	0.400	0.817	--	--
Guardrails	0.285	0.498	0.866	0.911	--
Adversarial	0.239	0.542	0.833	--	--

What the scoring dimensions mean

Coverage -- Does the response include specific numbers, measurements, code sections, and trade-relevant detail? A response that says "it depends on the region" scores low. A response that says "$3.50-$5.00/SF in the Midwest, adjust +15% for Denver metro" scores high.
Voice -- Does it sound like a contractor or a chatbot? Real terminology (rough-in, punch list, change order), direct answers, no corporate filler. "I recommend consulting a professional" = 0. "Get it in writing before you start demo" = 5.
Format -- Tables for pricing, bold key numbers, headers for sections, bullets for line items. Construction estimates are structured documents, not paragraphs.
Guardrail Compliance -- Does the model refuse or redirect appropriately on questions that need a licensed professional? Structural engineering, load calculations, electrical panel sizing -- these need a stamp, not a chatbot answer.
Estimation Accuracy -- Are the dollar amounts, quantities, and rates within realistic ranges for the trade and region? Checked against RS Means, HomeAdvisor, and contractor-verified pricing.
Follow-up Quality -- Does the response end with a relevant follow-up question that moves the conversation forward? Good follow-ups narrow scope: "What climate zone?" not "Is there anything else?"

Eval Categories Explained

What each category tests

Category	Questions	What It Tests
Regional Estimation	25	Pricing accuracy adjusted for location, climate zone, and local market conditions
Building Codes	20	Correct code section citations (IRC, IBC, NEC, IPC, IMC) with application context
Contractor Voice	15	Trade terminology, direct communication style, absence of corporate language
Multi-Step Reasoning	20	Complex estimates requiring multiple trades, phasing, and dependency chains
Safety / OSHA	15	OSHA 29 CFR 1926 requirements, violation penalties, real-world safety guidance
Guardrails	15	Appropriate refusal/redirection on questions requiring licensed professionals
Format Quality	15	Structured output -- tables, bold numbers, headers, bullet lists
Tool Calling	10	Correct tool selection, valid JSON schema, appropriate parameter values
Commercial	15	Tenant improvements, change of occupancy, prevailing wage, commercial HVAC
Estimation Accuracy	20	Dollar amounts checked against RS Means and contractor-verified pricing
Adversarial	15	Trick questions, incorrect assumptions in prompts, scope traps
Long Form	15	Comprehensive walkthroughs requiring sustained depth across multiple topics

Version Progression

v1 (1,087 examples)    ||||                          ~92% token acc    Seed model
v2 (5,200 examples)    ||||||||                      97.5% (overfit)   First real model
v3 (23,800 examples)   ||||||||||||||||              ~88%              Generalization
v4 (138,904 examples)  ||||||||||||||||||||||||||    ~89%              Production
v5 (81,813 curated)    -------------------------    IN PROGRESS       Quality over quantity

Key insight: More diverse data lowers training accuracy but improves real-world performance. v2 hit 97.5% by memorizing 5K examples. v3 dropped to 88% on 24K but generalized far better. The ceiling on training accuracy is the noise floor of the domain -- construction costs genuinely vary, and ~89% reflects the model learning patterns, not memorizing answers.

Training

Architecture


Base Model	`unsloth/Meta-Llama-3.1-8B-Instruct`
Method	QLoRA 4-bit (bitsandbytes)
LoRA Rank / Alpha	128 / 256
Trainable Parameters	~335M (4% of model)
Framework	Unsloth FastLanguageModel + trl SFTTrainer
Epochs	2
Learning Rate	2e-4, cosine schedule, warmup 100 steps
Batch Size	Effective 16 (8 per device, grad accum 2)
Sequence Length	8,192 (training) / 128K (inference, Llama 3.1 native)
Packing	ON (reduces 13K steps to ~2,584 with 106K examples)
Hardware	2x A100-SXM4-80GB on vast.ai
Quantization	Q8_0 and Q4_K_M via llama.cpp
Training Cost	~$30-40 per full training run

Data Pipeline

Prompts ──> Distillation ──> Scoring ──> Filtering ──> Dedup ──> Balance ──> Contamination Check ──> Split
  (seeded,     (SGLang,       (5 dims,    (min 3.0)    (hash +    (25% cap)   (vs 200 eval          (95/5,
   multi-       30 workers     multi-                   TF-IDF +               questions)             val deduped
   persona)     per A100)      process)                 MinHash)                                      vs train)

Quality scoring is pure heuristic -- no LLM calls. Five dimensions scored 1-5: format (tables, headers, bold numbers), voice (contractor patterns vs corporate patterns), coverage (dollar amounts, measurements, code refs), follow-up (ends with relevant question), guardrails (redirects on safety/structural). Multiprocessing handles 250K+ examples.

Deduplication runs in two phases: exact hash on normalized user messages, then MinHash LSH (threshold 0.7) for semantic near-duplicates. The LSH approach is O(n) instead of O(n^2), handling 180K examples in 4 minutes on a MacBook.

Contamination check verifies none of the 200 eval questions leaked into training data via exact match, TF-IDF cosine similarity, and trigram overlap.

Data Sources

Source	Examples	What It Teaches
Multi-server distillation	138,904	Broad domain coverage -- every trade, question type, and difficulty level
Real contractor data	1,600+	Actual estimates, change orders, and project data from a licensed contractor
Scraped contractor forums	621	Real voice, real problems -- Reddit r/construction, YouTube cost breakdowns
2025 cost guides (21 trades)	309	Current national average pricing from HomeAdvisor, Fixr, cost databases
TxDOT public bid tabulations	166	Real government bids vs. engineer estimates -- teaches bid variance
Expert-written examples	500+	Hand-crafted by a construction professional with field experience
Gap-targeted prompts	832	Adversarial, multi-step reasoning, NER -- targeted at weakest eval categories

57 source JSONL files across the training_data directory, covering every major trade, code system, and estimation scenario.

Distillation Teachers

Model	Role	Speed	Notes
Qwen 2.5 7B Instruct	Primary teacher	~5.5 ex/sec	Best quality/speed ratio on SGLang
GLM 4.7 Flash	Early rounds	~3 ex/sec	Good coverage, slower on vLLM
Qwen 3 8B	Gap filling	~4 ex/sec	Thinking disabled -- wastes 30-50% tokens otherwise
Qwen 2.5 3B	Budget runs	~2 ex/sec	Usable on smaller GPUs

All distillation runs on SGLang or vLLM on A100 GPUs via vast.ai. 30 workers per server saturates an A100. One A100 with SGLang outperforms 10x consumer GPUs with vLLM.

Key lesson: Run distillation ON the server, not over SSH. Tunnels drop during long runs. SCP results after.

What we learned training v1 through v4

Never fine-tune instruct models with full fine-tuning. LoRA on instruct works (that's what ConstructionAI does). Full fine-tune on instruct models fights the alignment. Learned this the hard way.
2 epochs is enough for large diverse datasets. 3 epochs risks overfitting. 1 epoch is not enough -- v4.1 proved this when guardrails collapsed from 0.91 to 0.38 on a single epoch.
LoRA rank 128 > rank 64 for domain-heavy tasks. More capacity for specialized knowledge.
lr 2e-4 with cosine decay is the reliable default. Most learning happens in the first 1.5 epochs.
Packing is critical for throughput. Reduced 13K steps to 2,584 by packing short examples together.
Data quality > data quantity. v4 filtered 200K raw down to 139K curated. The 61K removed examples would have hurt the model.
Diverse data lowers training accuracy but improves generalization. v2's 97.5% on 5K examples was memorization. v3's 88% on 24K was learning.
Real business data is the moat. Synthetic gets you to 85%. Real estimates, change orders, and project data get you to 95%.
Always check for contamination. Eval questions in training data make eval scores meaningless.
Train on 8K context, serve at 128K. Most examples are short. Training at full context is wasteful.

Production

ConstructionAI runs in production on RunPod Serverless at ~$0.002/estimate. REST API via FastAPI wraps Ollama for integration with web and mobile frontends.

API Server

pip install -r api/requirements.txt
python3 api/server.py --port 8000

Endpoint	Method	Description
`/chat`	POST	Conversational interface with message history
`/estimate`	POST	Generate structured estimate from project description
`/health`	GET	Server health check

Auth via bearer token (CONSTRUCTIONAI_API_KEY). CORS configurable via CONSTRUCTIONAI_CORS_ORIGINS.

Supports Ollama (local, default) and RunPod (production) backends. When the model URL contains "runpod", the service automatically uses the RunPod API format.

Live Pricing Middleware

Real-time material pricing from retailer APIs, bridged through the price-scraper MCP:

# Get live-priced estimate for a specific zip code
python tools/pricing_middleware.py --zip 78745 "Kitchen remodel, 12x15, gut to studs, quartz counters"

# Pipe to PDF
python tools/pricing_middleware.py --zip 78745 "2,400 SF ranch" --json | python tools/generate_pdf.py --stdin

Queries current pricing from Home Depot, Lowe's, and Ferguson. Caches results weekly per zip code to avoid redundant API calls.

PDF Generation

Three document generators, all using reportlab:

# Professional estimate PDF
python3 tools/generate_estimate_pdf.py --input estimate.json --output estimate.pdf

# Change order document
python3 tools/generate_change_order_pdf.py --demo

# Client proposal
python3 tools/generate_proposal_pdf.py --demo

Roadmap

v5 -- In Progress

v5 training data is finalized: 81,813 train / 4,305 val after aggressive dedup (MinHash LSH at 0.7 threshold) and quality filtering (heuristic scorer, min 3.0 across 5 dimensions). Training on vast.ai next.

What's new in v5

New Data	Count	Why It Matters
Real-world scraped examples	621	Contractor forums, YouTube costs -- voice that synthetic can't replicate
2025 cost guides (21 trades)	309	Current pricing, not stale training data
TxDOT public bid tabulations	166	Real bid-vs-estimate variance from government contracts
Targeted gap prompts	832	Adversarial, multi-step, NER -- attacks weakest eval categories directly
Banked prompts for distillation	504,000	~25 hours to distill on a single A100 via SGLang

v5 Target Scores

Dimension	v4	v5 Target	Strategy
Coverage	0.41	0.55+	75K new distilled examples targeting weak categories
Voice	0.43	0.55+	Real scraped contractor data, not synthetic
Guardrails	0.91	0.90+	Must not regress (v4.1 proved 1 epoch kills guardrails)
Est. Accuracy	0.84	0.88+	2025 cost guide data across 21 trades
Adversarial	0.24	0.40+	832 adversarial gap prompts
Multi-Step	0.33	0.50+	775 multi-step reasoning prompts

v6 -- 500K Scale

504K prompts banked for distillation through Qwen 2.5 7B on SGLang. After quality filtering and dedup, that yields ~350-400K usable examples. Combined with existing data, v6 reaches 500K+ curated training examples.

At that scale, the model stops being "pretty good for a fine-tune" and starts being the authority on construction pricing and code compliance for an 8B model. 500K covers the edge cases 100K can't -- the weird plumbing reroute, the historic district variance, the prevailing wage calculation on a mixed-use project.

3B Student Model

The 8B teacher is the quality model. But real-time bidding needs sub-second estimates. The plan: distill v5/v6 down to a 3B student (Qwen 3.5 2B Base or Qwen 2.5 3B).

Students learn from the teacher's curated outputs, not raw training data. They often outperform teachers on focused tasks because they learn from clean, filtered examples instead of the messy, diverse data the teacher saw.

Target: ~1 second per estimate on consumer hardware. Fast enough for a contractor to get a ballpark while standing on the job site.

Blueprint Understanding

scan_blueprint already exists as a tool routing to an external VLM (Qwen2.5-VL). The next step is a full multimodal pipeline: feed in blueprint images, get back structured takeoffs -- wall lengths, window counts, room dimensions, door schedules -- extracted directly from the plans.

This is where ConstructionAI stops being a chatbot and starts being a real estimating tool. The contractor takes a photo of the plans, and the model does the takeoff.

Regional Pricing Engine

Construction costs aren't national. A bathroom remodel in Manhattan costs 3x what it costs in Oxford, Mississippi. ConstructionAI already has RS Means locality factors for ~900 zip code zones, but the model doesn't use them natively yet.

Goal: Location-aware estimates that automatically adjust for regional labor rates, material availability, code jurisdiction, and climate zone. Ask "what does a roof cost" and the model asks you where -- then gives a number that reflects your actual market, not a national average.

Live Pricing Integration

The pricing middleware already bridges to retailer APIs for current material pricing. Next step: the model queries live prices automatically when generating estimates, so the lumber numbers reflect what you'd actually pay this week, not what the training data said six months ago.

Real-time pricing + regional adjustment + historical bid variance = estimates contractors actually trust.

The Data Flywheel

Synthetic data gets you to 85%. Real data gets you to 95%.

Every estimate a contractor creates. Every bid posted. Every change order, every material substitution, every "the homeowner wanted to upgrade the countertops mid-project" scope change. Those are training examples that no synthetic pipeline can generate -- because they capture what actually happens on real jobs, not what a teacher model thinks should happen.

The products aren't just customers of the model. The products are the data pipeline. The more contractors use them, the better ConstructionAI gets, the more useful the products become. That's the flywheel.

The Vision

Construction is a $2 trillion industry that still runs on spreadsheets, gut feel, and "I'll get back to you." Contractors are good at building -- they shouldn't also have to be good at data entry, code lookup, and pricing research.

ConstructionAI isn't trying to replace contractors. It's trying to give every contractor the same information advantage that the big shops have -- the pricing databases, the code libraries, the estimation templates. An 8B model running on a $0.002/query serverless endpoint can put that in every contractor's pocket.

The model that knows what a 2x6 costs, when you need a Simpson HDU, and how to price a change order without getting sued -- that's not a chatbot. That's a tool a foreman trusts on a job site.

About

James Walton -- Strata Software Group, Oxford, Mississippi

Construction industry background. Self-taught developer. Went from job sites to fine-tuning LLMs in 7 months because the tools that exist for this industry aren't good enough and the people building AI don't know what a punch list is.

ConstructionAI powers FairTradeWorker -- a two-sided construction marketplace replacing Angi. QuickBooks-native payments, AI estimation, three-role system (homeowner/contractor/subcontractor). Launching in Texas.

License

Built on Meta Llama 3.1. Subject to the Llama 3.1 Community License.

Model weights and training data: proprietary -- Strata Software Group, 2026. Not for commercial redistribution.

Downloads last month: 43

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

8-bit

Model tree for Aphrodine/constructionai

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(612)

this model

Evaluation results

Estimation Accuracy (Regional)
self-reported

0.852
Estimation Accuracy (General)
self-reported

0.831
Guardrail Compliance
self-reported

0.911
Format Quality
self-reported

0.826