You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ConstructionAI

The first open-weight LLM built for the construction industry.

138,904 training examples | 200 eval questions | 9 function-calling tools | 25+ trades | 90 scripts

Base Model License Format Ollama Production

Built by Strata Software Group to power FairTradeWorker


Not a general model with a construction prompt. This was trained exclusively on real project data, real pricing, real code sections, and real trade knowledge. It estimates jobs, cites building codes by section number, speaks like a contractor, and calls tools to generate PDFs, takeoffs, and schedules.

Why does this exist?

Construction is a $2 trillion industry that runs on spreadsheets, gut feel, and "I'll get back to you." General-purpose LLMs don't know what a 2x6 costs, can't cite IRC R602.7 from memory, and answer construction questions like a college textbook instead of a job site.

ConstructionAI was built by someone who came from the industry -- because the tools that exist aren't good enough and the people building AI don't know what a punch list is.


Available Models

File Version Quant Size Training Data Notes
constructionai-v4-Q8_0.gguf v4 Q8_0 8.0 GB 138,904 Best quality. Use this.
constructionai-v3-Q8_0.gguf v3 Q8_0 8.0 GB 23,800 Stable fallback
constructionai-v2-Q4_K_M.gguf v2 Q4_K_M 4.6 GB 5,200 Smaller, faster
constructionai-v1-Q4_K_M.gguf v1 Q4_K_M 4.6 GB 1,087 Legacy

Recommendation: Use v4. It has 25x more training data than v1, covers 12 evaluation categories, and scores 85%+ on estimation accuracy.


Quick Start

Ollama (Recommended)
# Download the model and Modelfile
huggingface-cli download Aphrodine/constructionai constructionai-v4-Q8_0.gguf Modelfile.v4 --local-dir ./models/

# Create in Ollama
ollama create constructionai -f models/Modelfile.v4

# Run
ollama run constructionai
Python (requests)
import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "constructionai",
    "messages": [
        {"role": "user", "content": "Price out a 2,400 SF roof tear-off and re-shingle in Dallas. Architectural shingles."}
    ]
})
print(response.json()["message"]["content"])
OpenAI-Compatible (vLLM / SGLang)
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="constructionai",
    messages=[
        {"role": "user", "content": "What's the IRC requirement for egress windows in a basement bedroom?"}
    ]
)
print(response.choices[0].message.content)
System Prompt (built into Modelfile)
You are an expert construction estimation and building trades AI assistant. You have deep
knowledge of residential and commercial construction including material pricing, labor rates,
building codes (IRC, IBC, NEC, IPC, IMC), OSHA safety regulations (29 CFR 1926), CSI
MasterFormat divisions, trade terminology, project management, change orders, permits, and
inspection processes. You speak like an experienced contractor — direct, practical, and
specific. You give real numbers, cite code sections when relevant, and flag risks a contractor
needs to know about. You never give vague corporate answers. You are the kind of AI a foreman
trusts on a job site.

Capabilities

Construction Estimation

Line-item cost breakdowns with labor/material splits, regional pricing, waste factors, and CSI MasterFormat codes. Knows the difference between a bathroom gut-reno in Austin and a bathroom facelift in Detroit.

Example: Framing Estimate

User: I need to frame a 2,400 sq ft single-story addition in climate zone 4A. What's my lumber package?

ConstructionAI: For a 2,400 SF single-story addition in zone 4A:

Exterior Walls (2x6, R-20 required per IRC N1102.1.2 for zone 4A):

  • Perimeter: ~200 LF of wall
  • Studs 16" OC: ~150 studs @ $7.50 = $1,125
  • Top/bottom plates (triple top): ~600 LF 2x6 @ $0.85/LF = $510

Floor System:

  • Engineered TJI 230, 16" OC: 48 joists @ $38 = $1,824
  • Subfloor 3/4" T&G AdvanTech: 75 sheets @ $52 = $3,900

Lumber Package Total: $15,500-$17,000 All-in framing budget: $30,000-$36,000

Current SPF 2x6 trending ~$485/MBF. Lock pricing within 2 weeks of ordering. Add 12-15% contingency for lumber volatility.

What's the foundation type -- slab on grade or crawlspace? That changes the sill plate and anchor bolt spec.

Example: Multi-Step CSI Division Estimate

User: Walk me through estimating a 1,500 sq ft addition. Two-story, wood frame, on a crawl space. I need a ballpark broken down by CSI division.

ConstructionAI:

Total Estimated Cost: $165,000 - $195,000

CSI Division Description Estimated Cost
Division 01 General Conditions & Permits $8,000 - $12,000
Division 02 Sitework & Excavation $5,000 - $8,000
Division 03 Concrete (crawl space foundation) $12,000 - $18,000
Division 04 Wood Framing (2-story) $25,000 - $32,000
Division 05 Metal (connectors, flashing) $3,000 - $5,000
Division 06 Windows & Doors $8,000 - $12,000
Division 07 Roofing & Waterproofing $10,000 - $14,000
Division 08 Finishes (drywall, paint, flooring) $20,000 - $28,000
Division 09 Mechanical (HVAC) $12,000 - $16,000
Division 10 Electrical $10,000 - $14,000
Division 11 Plumbing $8,000 - $12,000

Two-story on a crawl space means your framing costs are front-loaded -- the second floor adds structural complexity with load-bearing walls, engineered headers, and potentially LVL beams. Budget 15-20% more for framing than a single-story of the same footage.

What climate zone? That drives your insulation spec and energy code compliance costs.

Example: Regional Pricing

User: I'm pricing a 2,000 sq ft roof tear-off and re-shingle in Dallas, Texas. Architectural shingles. What should I budget?

ConstructionAI:

Roof Tear-Off and Re-Shingle -- 2,000 SF, Dallas TX, Architectural Shingles

Item Cost
Tear-off (1 layer) $1,800 - $2,400
Disposal / dumpster $400 - $600
Underlayment (synthetic) $600 - $800
Architectural shingles (material) $2,800 - $3,600
Ridge vent + hip/ridge caps $400 - $600
Flashing (valleys, pipe boots, step) $600 - $900
Labor (install) $3,000 - $4,200
Total $9,600 - $13,100

Dallas pricing runs about 5-8% above national average due to storm demand and labor competition. If you're in a hail-prone zip code, check with your insurance -- many carriers are non-renewing in North Texas, which affects how you spec the shingle impact rating.

How many layers are on the existing roof? A second layer tear-off adds $800-$1,200.

Building Codes

Cites actual code sections from IRC, IBC, NEC, IPC, IMC, IECC, and OSHA 29 CFR 1926. Real section numbers with context for how they apply on the job -- not generic summaries.

Example: OSHA Fall Protection

User: What's the OSHA fall protection requirement on a residential job?

ConstructionAI: Residential gets a different standard than commercial -- this trips people up.

Residential (29 CFR 1926.501(b)(13)): Trigger height is 6 feet, BUT residential gets an exception under Subpart M. Employers can use alternative fall protection instead of conventional guardrails/nets/PFAS.

Acceptable alternatives: slide guards, safety monitoring system, warning line systems on low-slope roofs.

The catch: You MUST have a written fall protection plan if using alternatives. OSHA can still cite you if a worker falls without one. That's a willful violation -- $156,259 max per violation as of 2026. Get it in writing.

Are you the GC or a sub? That changes who's responsible for the plan.

Example: Load-Bearing Wall Removal

User: Can I remove this wall in my house? It's between the kitchen and living room.

ConstructionAI: Before you start, ensure that the wall is not load-bearing. This means checking if there are any structural beams or columns on the other side of the wall. If the wall is supporting part of your roof or floor structure, removing it could compromise the integrity of your home.

Code Section: IRC R602.1

How to tell if it's load-bearing:

  • Does the wall run perpendicular to the floor joists above?
  • Is there a wall, beam, or column directly above or below it?
  • Does it sit on a beam or foundation wall in the crawl space / basement?

If the answer to any of those is yes, you need a structural engineer to spec a replacement beam (typically LVL or steel) before removal. That beam needs proper bearing points and may require a permit.

Do not remove a load-bearing wall without an engineered solution. The cost of a structural engineer's letter ($500-$1,500) is nothing compared to a roof sag or floor collapse.

Do you have access to the attic or crawl space? That's the fastest way to confirm.

Contractor Voice

Talks like a contractor, not a chatbot. Direct answers with real numbers first, qualifications second. No corporate fluff, no "it depends" without actually depending it.

Example: Pricing Check

User: My drywall guy wants $2.50/SF for hang and finish. Is that fair?

ConstructionAI: That's on the low side of market. National average for hang, tape, and finish (level 4) is running $2.75-$3.50/SF in 2025. If he's including material, that's a steal. If it's labor-only, it's fair for a competitive market.

Watch for: is that level 4 or level 5 finish? Level 5 (skim coat) adds $0.50-$1.00/SF. Also -- does the price include garage/utility areas at a lower finish level? Good drywall bids break it out by room type.

How many sheets total and what ceiling height? That'll tell me if the rate makes sense for the volume.

Example: Slab Thickness (Adversarial)

User: I need to pour a 4-inch slab for my two-story house. That's thick enough, right?

ConstructionAI: A 4-inch slab is typically sufficient for most residential applications, including two-story houses, as long as the foundation design and soil conditions are properly evaluated.

Code Requirements:

  • IRC R506.1: Minimum 3.5 inches for concrete slab-on-grade
  • IBC Section 1907: Requires evaluation of soil bearing capacity

But here's what matters more than the slab thickness:

  • What's your soil bearing capacity? Expansive clay (common in Texas, Colorado, parts of the Southeast) can heave a 4-inch slab. You may need a post-tension or structural slab instead.
  • Are you running plumbing under the slab? Under-slab plumbing needs to be inspected before the pour, and you need clearance for drain slope.
  • What's the rebar spec? A two-story house puts more point load on the slab at bearing walls. #4 rebar at 18" OC both ways is typical, but your engineer may spec more.

Has a geotechnical report been done on the lot? That's the real answer to whether 4 inches is enough.

Commercial Estimation

Not just residential. Covers tenant improvements, commercial HVAC, fire suppression, ADA compliance, prevailing wage, and use-change code requirements.

Example: Office-to-Restaurant Conversion

User: I'm converting a 3,000 sq ft office space into a restaurant. What code issues should I watch for?

ConstructionAI:

Fire Safety:

  • Fire separations between kitchen and dining must meet minimum fire resistance ratings per NFPA 101 and NFPA 13. Typically one-hour minimum.
  • Exit requirements change significantly -- restaurants need more exits per occupant than offices. Calculate occupant load at 15 SF/person (dining) vs 100 SF/person (office).
  • Commercial kitchen requires a Type I hood with fire suppression (NFPA 96). Budget $15,000-$25,000 for the hood system alone.

Plumbing:

  • Grease interceptor required for any commercial kitchen drain (IPC 1003.3). Size depends on fixture count and flow rate.
  • Additional restrooms based on new occupant load (IPC Table 403.1).

Mechanical:

  • Kitchen makeup air unit required to replace exhaust air from the hood. This is often the most expensive overlooked item -- $8,000-$15,000.
  • Separate HVAC zones for kitchen vs dining. Kitchen runs 10-15 degrees hotter.

ADA:

  • Accessible route, restrooms, seating, and counter height all need to comply with ADA 2010 Standards.
  • If the building hasn't been renovated since the ADA was enacted, you may trigger a path-of-travel upgrade (up to 20% of renovation cost).

Is this a change of occupancy from B (business) to A-2 (assembly)? That triggers a full code review by the building department, not just a permit.

First-Time GC Guide

Long-form knowledge for contractors who need comprehensive walkthroughs, not just quick answers.

Example: Running Your First Project

User: I just got my general contractor's license. Walk me through everything I need to know to run my first project from start to finish.

ConstructionAI:

1. Project Planning

  • Scope Definition: Define all required tasks and deliverables in writing. Example: "Kitchen remodel with new cabinets, countertops, flooring, and plumbing upgrades."
  • Site Inspection: Visit the site before bidding. Look for hidden conditions -- knob-and-tube wiring, asbestos, foundation cracks, drainage issues.
  • Permits: Pull all required permits before starting. Working without permits is a license risk, an insurance void, and a liability nightmare.

2. Estimating & Bidding

  • Break costs into labor, material, subcontractors, and overhead.
  • Add 10-15% contingency for unknowns on your first projects. You will miss things.
  • Markup: 15-25% over direct costs is standard for residential GCs. Know the difference between markup (on cost) and margin (on price).

3. Contracts

  • Never start work without a signed contract. Include: scope, price, payment schedule, change order process, timeline, warranty, and dispute resolution.
  • Use AIA or ConsensusDocs standard forms until you can afford a construction attorney to write custom contracts.

4. Scheduling

  • Build a schedule with trade dependencies. Framing before MEP rough-in, rough-in before insulation, insulation before drywall.
  • Schedule inspections in advance. A failed inspection can cost you a week.

5. Managing Subcontractors

  • Get certificates of insurance (COI) from every sub before they step on site. General liability + workers' comp. No exceptions.
  • Pay subs within terms. Your reputation with subs is your business. The GC who pays late gets the B-team crew.

6. Change Orders

  • Document every change in writing before doing the work. Include: description, cost impact, schedule impact, client signature.
  • Change order costs compound. A $500 change in framing can cascade to $2,000+ by the time electrical, plumbing, and drywall adjust.

7. Inspections & Closeout

  • Know your inspection sequence: foundation, framing, MEP rough-in, insulation, final.
  • Do your own punch list walk before the client's. Fix the obvious stuff before they see it.
  • Get lien waivers from every sub and supplier at final payment.

What type of project is your first one -- remodel, addition, or new construction? The risk profile is very different for each.

Trades Coverage

Structural

  • Framing
  • Concrete / Foundation
  • Structural Steel
  • Masonry / CMU

Envelope

  • Roofing
  • Siding / Exterior
  • Windows / Doors
  • Insulation
  • Waterproofing

Mechanical

  • HVAC
  • Plumbing
  • Electrical
  • Fire Protection

Finishes

  • Drywall
  • Painting
  • Tile / Stone
  • Flooring
  • Cabinets / Countertops

Specialty

  • Demolition
  • Excavation / Grading
  • Landscaping / Hardscaping
  • Fencing
  • Decks / Porches

Commercial

  • Tenant Improvements
  • Commercial HVAC
  • Fire Alarm / Suppression
  • ADA Compliance
  • Prevailing Wage

Function Calling

9 tools using Llama 3.1 native tool calling format. Compatible with any OpenAI function calling client. Tool definitions included in tools/tool_definitions.json.

Tool Reference
Tool What It Does Key Parameters
generate_estimate_pdf Formal estimates with CSI MasterFormat line items, labor/material split projectname, divisions[], subtotal, overhead%, profit_%, total
generate_change_order Scope change docs with justification types and cost impact co_number, items[], cost_impact, schedule_impact_days, reason
calculate_material_takeoff Material quantities with waste factors by trade materials[], wastefactor%, order_quantity, total_material_cost
calculate_markup_margin Pricing strategy -- breakeven, overhead recovery, profit margin directcosts, markup%, grossmargin%, sell_price, annual_overhead
lookup_building_code Code lookup by section, jurisdiction, and application code_source (IRC/IBC/NEC/IPC/OSHA), section, common_violations[]
generate_punch_list Final walkthrough checklists organized by severity and trade areas[], items[] with priority (critical/standard/cosmetic)
generate_project_schedule Phase-based timelines with trade dependencies and lead times phases[], critical_path_items[], weather_buffer_days
generate_proposal Client-facing proposals with payment milestones and scope of work scope_items[], price, payment_schedule[], exclusions[]
scan_blueprint Extract structured data from blueprint images (requires VLM) input_path, to_estimate_prompt, location, quality tier
Example: Tool Call for Kitchen Estimate

When asked to generate an estimate, the model produces structured JSON that the tool runner executes:

{
  "name": "generate_estimate_pdf",
  "parameters": {
    "project_name": "Kitchen Remodel - Smith Residence",
    "client_name": "John Smith",
    "estimate_number": "EST-2026-042",
    "project_type": "Kitchen Remodel",
    "square_footage": 200,
    "state": "TX",
    "city": "Austin",
    "divisions": [
      {
        "division": "02",
        "name": "Demolition",
        "items": [
          {
            "description": "Remove existing cabinets, countertops, flooring",
            "quantity": 1,
            "unit": "LS",
            "unit_price": 2500,
            "total": 2500
          }
        ]
      },
      {
        "division": "06",
        "name": "Cabinetry",
        "items": [
          {
            "description": "Mid-range cabinets, 25 LF base + 20 LF wall",
            "quantity": 45,
            "unit": "LF",
            "unit_price": 225,
            "total": 10125
          }
        ]
      },
      {
        "division": "09",
        "name": "Countertops",
        "items": [
          {
            "description": "Quartz countertops, 45 SF",
            "quantity": 45,
            "unit": "SF",
            "unit_price": 85,
            "total": 3825
          }
        ]
      }
    ],
    "subtotal": 36500,
    "overhead_percent": 10,
    "profit_percent": 15,
    "contingency_percent": 10,
    "total": 49275,
    "exclusions": ["Appliances", "Structural modifications", "Permit fees"],
    "timeline_weeks": 6
  }
}

The tool runner processes this into a formatted PDF estimate document.

Example: Blueprint Scan Pipeline

The scan_blueprint tool chains with a VLM (Qwen2.5-VL) to extract structured data from construction plans:

# Scan a blueprint and get structured JSON
python3 tools/blueprint_scanner.py --image plans/floor-plan.png --server http://localhost:11434 --model qwen2.5-vl

# Scan and generate an estimate prompt for ConstructionAI
python3 tools/blueprint_scanner.py --image plans/floor-plan.png --to-estimate --location "Austin, TX" --quality mid-range

Output includes rooms, dimensions, door/window schedules, and material callouts extracted directly from the plan image.


Evaluation

200 questions | 12 categories | 6 scoring dimensions

Full eval suite: scripts/eval_v4.py -- supports Ollama, vLLM, SGLang backends, LLM-judge mode, and model comparison.

v4 Scores by Category

Category Coverage Voice Format Guardrail Est. Accuracy
Regional Estimation 0.674 0.571 0.826 -- 0.852
Estimation Accuracy 0.654 0.460 0.825 -- 0.831
Long Form 0.490 0.395 0.855 -- --
Format Quality 0.476 0.262 0.878 -- --
Tool Calling 0.395 0.380 0.867 -- --
Safety / OSHA 0.393 0.320 0.767 -- --
Commercial 0.369 0.476 0.833 -- --
Contractor Voice 0.332 0.511 0.778 -- --
Multi-Step Reasoning 0.329 0.350 0.850 -- --
Building Codes 0.301 0.400 0.817 -- --
Guardrails 0.285 0.498 0.866 0.911 --
Adversarial 0.239 0.542 0.833 -- --
What the scoring dimensions mean
  • Coverage -- Does the response include specific numbers, measurements, code sections, and trade-relevant detail? A response that says "it depends on the region" scores low. A response that says "$3.50-$5.00/SF in the Midwest, adjust +15% for Denver metro" scores high.
  • Voice -- Does it sound like a contractor or a chatbot? Real terminology (rough-in, punch list, change order), direct answers, no corporate filler. "I recommend consulting a professional" = 0. "Get it in writing before you start demo" = 5.
  • Format -- Tables for pricing, bold key numbers, headers for sections, bullets for line items. Construction estimates are structured documents, not paragraphs.
  • Guardrail Compliance -- Does the model refuse or redirect appropriately on questions that need a licensed professional? Structural engineering, load calculations, electrical panel sizing -- these need a stamp, not a chatbot answer.
  • Estimation Accuracy -- Are the dollar amounts, quantities, and rates within realistic ranges for the trade and region? Checked against RS Means, HomeAdvisor, and contractor-verified pricing.
  • Follow-up Quality -- Does the response end with a relevant follow-up question that moves the conversation forward? Good follow-ups narrow scope: "What climate zone?" not "Is there anything else?"

Eval Categories Explained

What each category tests
Category Questions What It Tests
Regional Estimation 25 Pricing accuracy adjusted for location, climate zone, and local market conditions
Building Codes 20 Correct code section citations (IRC, IBC, NEC, IPC, IMC) with application context
Contractor Voice 15 Trade terminology, direct communication style, absence of corporate language
Multi-Step Reasoning 20 Complex estimates requiring multiple trades, phasing, and dependency chains
Safety / OSHA 15 OSHA 29 CFR 1926 requirements, violation penalties, real-world safety guidance
Guardrails 15 Appropriate refusal/redirection on questions requiring licensed professionals
Format Quality 15 Structured output -- tables, bold numbers, headers, bullet lists
Tool Calling 10 Correct tool selection, valid JSON schema, appropriate parameter values
Commercial 15 Tenant improvements, change of occupancy, prevailing wage, commercial HVAC
Estimation Accuracy 20 Dollar amounts checked against RS Means and contractor-verified pricing
Adversarial 15 Trick questions, incorrect assumptions in prompts, scope traps
Long Form 15 Comprehensive walkthroughs requiring sustained depth across multiple topics

Version Progression

v1 (1,087 examples)    ||||                          ~92% token acc    Seed model
v2 (5,200 examples)    ||||||||                      97.5% (overfit)   First real model
v3 (23,800 examples)   ||||||||||||||||              ~88%              Generalization
v4 (138,904 examples)  ||||||||||||||||||||||||||    ~89%              Production
v5 (81,813 curated)    -------------------------    IN PROGRESS       Quality over quantity

Key insight: More diverse data lowers training accuracy but improves real-world performance. v2 hit 97.5% by memorizing 5K examples. v3 dropped to 88% on 24K but generalized far better. The ceiling on training accuracy is the noise floor of the domain -- construction costs genuinely vary, and ~89% reflects the model learning patterns, not memorizing answers.


Training

Architecture
Base Model unsloth/Meta-Llama-3.1-8B-Instruct
Method QLoRA 4-bit (bitsandbytes)
LoRA Rank / Alpha 128 / 256
Trainable Parameters ~335M (4% of model)
Framework Unsloth FastLanguageModel + trl SFTTrainer
Epochs 2
Learning Rate 2e-4, cosine schedule, warmup 100 steps
Batch Size Effective 16 (8 per device, grad accum 2)
Sequence Length 8,192 (training) / 128K (inference, Llama 3.1 native)
Packing ON (reduces 13K steps to ~2,584 with 106K examples)
Hardware 2x A100-SXM4-80GB on vast.ai
Quantization Q8_0 and Q4_K_M via llama.cpp
Training Cost ~$30-40 per full training run
Data Pipeline
Prompts ──> Distillation ──> Scoring ──> Filtering ──> Dedup ──> Balance ──> Contamination Check ──> Split
  (seeded,     (SGLang,       (5 dims,    (min 3.0)    (hash +    (25% cap)   (vs 200 eval          (95/5,
   multi-       30 workers     multi-                   TF-IDF +               questions)             val deduped
   persona)     per A100)      process)                 MinHash)                                      vs train)

Quality scoring is pure heuristic -- no LLM calls. Five dimensions scored 1-5: format (tables, headers, bold numbers), voice (contractor patterns vs corporate patterns), coverage (dollar amounts, measurements, code refs), follow-up (ends with relevant question), guardrails (redirects on safety/structural). Multiprocessing handles 250K+ examples.

Deduplication runs in two phases: exact hash on normalized user messages, then MinHash LSH (threshold 0.7) for semantic near-duplicates. The LSH approach is O(n) instead of O(n^2), handling 180K examples in 4 minutes on a MacBook.

Contamination check verifies none of the 200 eval questions leaked into training data via exact match, TF-IDF cosine similarity, and trigram overlap.

Data Sources
Source Examples What It Teaches
Multi-server distillation 138,904 Broad domain coverage -- every trade, question type, and difficulty level
Real contractor data 1,600+ Actual estimates, change orders, and project data from a licensed contractor
Scraped contractor forums 621 Real voice, real problems -- Reddit r/construction, YouTube cost breakdowns
2025 cost guides (21 trades) 309 Current national average pricing from HomeAdvisor, Fixr, cost databases
TxDOT public bid tabulations 166 Real government bids vs. engineer estimates -- teaches bid variance
Expert-written examples 500+ Hand-crafted by a construction professional with field experience
Gap-targeted prompts 832 Adversarial, multi-step reasoning, NER -- targeted at weakest eval categories

57 source JSONL files across the training_data directory, covering every major trade, code system, and estimation scenario.

Distillation Teachers
Model Role Speed Notes
Qwen 2.5 7B Instruct Primary teacher ~5.5 ex/sec Best quality/speed ratio on SGLang
GLM 4.7 Flash Early rounds ~3 ex/sec Good coverage, slower on vLLM
Qwen 3 8B Gap filling ~4 ex/sec Thinking disabled -- wastes 30-50% tokens otherwise
Qwen 2.5 3B Budget runs ~2 ex/sec Usable on smaller GPUs

All distillation runs on SGLang or vLLM on A100 GPUs via vast.ai. 30 workers per server saturates an A100. One A100 with SGLang outperforms 10x consumer GPUs with vLLM.

Key lesson: Run distillation ON the server, not over SSH. Tunnels drop during long runs. SCP results after.

What we learned training v1 through v4
  1. Never fine-tune instruct models with full fine-tuning. LoRA on instruct works (that's what ConstructionAI does). Full fine-tune on instruct models fights the alignment. Learned this the hard way.
  2. 2 epochs is enough for large diverse datasets. 3 epochs risks overfitting. 1 epoch is not enough -- v4.1 proved this when guardrails collapsed from 0.91 to 0.38 on a single epoch.
  3. LoRA rank 128 > rank 64 for domain-heavy tasks. More capacity for specialized knowledge.
  4. lr 2e-4 with cosine decay is the reliable default. Most learning happens in the first 1.5 epochs.
  5. Packing is critical for throughput. Reduced 13K steps to 2,584 by packing short examples together.
  6. Data quality > data quantity. v4 filtered 200K raw down to 139K curated. The 61K removed examples would have hurt the model.
  7. Diverse data lowers training accuracy but improves generalization. v2's 97.5% on 5K examples was memorization. v3's 88% on 24K was learning.
  8. Real business data is the moat. Synthetic gets you to 85%. Real estimates, change orders, and project data get you to 95%.
  9. Always check for contamination. Eval questions in training data make eval scores meaningless.
  10. Train on 8K context, serve at 128K. Most examples are short. Training at full context is wasteful.

Production

ConstructionAI runs in production on RunPod Serverless at ~$0.002/estimate. REST API via FastAPI wraps Ollama for integration with web and mobile frontends.

API Server
pip install -r api/requirements.txt
python3 api/server.py --port 8000
Endpoint Method Description
/chat POST Conversational interface with message history
/estimate POST Generate structured estimate from project description
/health GET Server health check

Auth via bearer token (CONSTRUCTIONAI_API_KEY). CORS configurable via CONSTRUCTIONAI_CORS_ORIGINS.

Supports Ollama (local, default) and RunPod (production) backends. When the model URL contains "runpod", the service automatically uses the RunPod API format.

Live Pricing Middleware

Real-time material pricing from retailer APIs, bridged through the price-scraper MCP:

# Get live-priced estimate for a specific zip code
python tools/pricing_middleware.py --zip 78745 "Kitchen remodel, 12x15, gut to studs, quartz counters"

# Pipe to PDF
python tools/pricing_middleware.py --zip 78745 "2,400 SF ranch" --json | python tools/generate_pdf.py --stdin

Queries current pricing from Home Depot, Lowe's, and Ferguson. Caches results weekly per zip code to avoid redundant API calls.

PDF Generation

Three document generators, all using reportlab:

# Professional estimate PDF
python3 tools/generate_estimate_pdf.py --input estimate.json --output estimate.pdf

# Change order document
python3 tools/generate_change_order_pdf.py --demo

# Client proposal
python3 tools/generate_proposal_pdf.py --demo

Roadmap

v5 -- In Progress

v5 training data is finalized: 81,813 train / 4,305 val after aggressive dedup (MinHash LSH at 0.7 threshold) and quality filtering (heuristic scorer, min 3.0 across 5 dimensions). Training on vast.ai next.

What's new in v5
New Data Count Why It Matters
Real-world scraped examples 621 Contractor forums, YouTube costs -- voice that synthetic can't replicate
2025 cost guides (21 trades) 309 Current pricing, not stale training data
TxDOT public bid tabulations 166 Real bid-vs-estimate variance from government contracts
Targeted gap prompts 832 Adversarial, multi-step, NER -- attacks weakest eval categories directly
Banked prompts for distillation 504,000 ~25 hours to distill on a single A100 via SGLang
v5 Target Scores
Dimension v4 v5 Target Strategy
Coverage 0.41 0.55+ 75K new distilled examples targeting weak categories
Voice 0.43 0.55+ Real scraped contractor data, not synthetic
Guardrails 0.91 0.90+ Must not regress (v4.1 proved 1 epoch kills guardrails)
Est. Accuracy 0.84 0.88+ 2025 cost guide data across 21 trades
Adversarial 0.24 0.40+ 832 adversarial gap prompts
Multi-Step 0.33 0.50+ 775 multi-step reasoning prompts

v6 -- 500K Scale

504K prompts banked for distillation through Qwen 2.5 7B on SGLang. After quality filtering and dedup, that yields ~350-400K usable examples. Combined with existing data, v6 reaches 500K+ curated training examples.

At that scale, the model stops being "pretty good for a fine-tune" and starts being the authority on construction pricing and code compliance for an 8B model. 500K covers the edge cases 100K can't -- the weird plumbing reroute, the historic district variance, the prevailing wage calculation on a mixed-use project.

3B Student Model

The 8B teacher is the quality model. But real-time bidding needs sub-second estimates. The plan: distill v5/v6 down to a 3B student (Qwen 3.5 2B Base or Qwen 2.5 3B).

Students learn from the teacher's curated outputs, not raw training data. They often outperform teachers on focused tasks because they learn from clean, filtered examples instead of the messy, diverse data the teacher saw.

Target: ~1 second per estimate on consumer hardware. Fast enough for a contractor to get a ballpark while standing on the job site.

Blueprint Understanding

scan_blueprint already exists as a tool routing to an external VLM (Qwen2.5-VL). The next step is a full multimodal pipeline: feed in blueprint images, get back structured takeoffs -- wall lengths, window counts, room dimensions, door schedules -- extracted directly from the plans.

This is where ConstructionAI stops being a chatbot and starts being a real estimating tool. The contractor takes a photo of the plans, and the model does the takeoff.

Regional Pricing Engine

Construction costs aren't national. A bathroom remodel in Manhattan costs 3x what it costs in Oxford, Mississippi. ConstructionAI already has RS Means locality factors for ~900 zip code zones, but the model doesn't use them natively yet.

Goal: Location-aware estimates that automatically adjust for regional labor rates, material availability, code jurisdiction, and climate zone. Ask "what does a roof cost" and the model asks you where -- then gives a number that reflects your actual market, not a national average.

Live Pricing Integration

The pricing middleware already bridges to retailer APIs for current material pricing. Next step: the model queries live prices automatically when generating estimates, so the lumber numbers reflect what you'd actually pay this week, not what the training data said six months ago.

Real-time pricing + regional adjustment + historical bid variance = estimates contractors actually trust.

The Data Flywheel

Synthetic data gets you to 85%. Real data gets you to 95%.

Every estimate a contractor creates. Every bid posted. Every change order, every material substitution, every "the homeowner wanted to upgrade the countertops mid-project" scope change. Those are training examples that no synthetic pipeline can generate -- because they capture what actually happens on real jobs, not what a teacher model thinks should happen.

The products aren't just customers of the model. The products are the data pipeline. The more contractors use them, the better ConstructionAI gets, the more useful the products become. That's the flywheel.


The Vision

Construction is a $2 trillion industry that still runs on spreadsheets, gut feel, and "I'll get back to you." Contractors are good at building -- they shouldn't also have to be good at data entry, code lookup, and pricing research.

ConstructionAI isn't trying to replace contractors. It's trying to give every contractor the same information advantage that the big shops have -- the pricing databases, the code libraries, the estimation templates. An 8B model running on a $0.002/query serverless endpoint can put that in every contractor's pocket.

The model that knows what a 2x6 costs, when you need a Simpson HDU, and how to price a change order without getting sued -- that's not a chatbot. That's a tool a foreman trusts on a job site.


About

James Walton -- Strata Software Group, Oxford, Mississippi

Construction industry background. Self-taught developer. Went from job sites to fine-tuning LLMs in 7 months because the tools that exist for this industry aren't good enough and the people building AI don't know what a punch list is.

ConstructionAI powers FairTradeWorker -- a two-sided construction marketplace replacing Angi. QuickBooks-native payments, AI estimation, three-role system (homeowner/contractor/subcontractor). Launching in Texas.


License

Built on Meta Llama 3.1. Subject to the Llama 3.1 Community License.

Model weights and training data: proprietary -- Strata Software Group, 2026. Not for commercial redistribution.

Downloads last month
43
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aphrodine/constructionai

Quantized
(612)
this model

Evaluation results