Hygroskopisch/bge-m3-ifc-kbob-finetuned

Sentence-Transformers model finetuned from BAAI/bge-m3 for IFC-based construction material retrieval in KBOB/LCA workflows.

Model Summary

Model ID: Hygroskopisch/bge-m3-ifc-kbob-finetuned
Release: v3 (2026-04-16)
Base model: BAAI/bge-m3
Embedding dimension: 1024
Max sequence length: 128
Similarity: cosine

The model is optimized for queries generated from IFC element metadata and maps them to KBOB-like material labels for downstream environmental impact workflows.

Intended Use

IFC-to-material retrieval in building and infrastructure datasets.
Candidate generation before manual validation in LCA pipelines.
Semantic search over construction material catalogs with domain-specific wording.

Out-of-Scope Use

Legal, compliance, or procurement decisions without human review.
Safety-critical engineering sign-off.
Use as a standalone source of truth for environmental declarations.

Responsible Use

Keep a human-in-the-loop for final material assignment.
Validate results against project context, standards, and local regulations.
Contact: sbert-lca@pm.me

Training Data

The v3 run used project-internal data artifacts and generated pair files.

Query source files: Training/query_generation/generated_queries
Expected mapping source files: Training/query_generation/generated_queries
Hard-negative strategy: fallback mode with random_preselected selection, up to 2 hard negatives per record

Train/dev counts from run metadata:

Total pairs: 16386
Train pairs: 14748
Dev pairs: 1638

Evaluation Data

Evaluation artifacts for this release:

eval/normal_queries/summary_eval-bge-m3-ifc-kbob-finetuned_model-1d06a0d7_queries-b9bc9eb9_no-reranker-7521044b.csv
eval/normal_queries/details_eval-bge-m3-ifc-kbob-finetuned_model-1d06a0d7_queries-b9bc9eb9_no-reranker-7521044b.csv

Evaluation query count: 389

Evaluation Results

The following results mirror the full evaluation summary in the main project README for the v3 model.

Core metrics by query set

Queries	Cases	Hit@1	Hit@10	Hit@20	Hit@30	Hit@50	MRR@10	MAP@10	nDCG@10	Recall@10
Normal	389	97.43%	99.49%	99.74%	99.74%	100.00%	0.984	0.932	0.954	0.960
Typos	389	88.43%	94.86%	98.20%	98.97%	99.49%	0.909	0.844	0.876	0.890
Missing Attribute	389	75.32%	92.80%	96.40%	98.20%	98.71%	0.803	0.750	0.794	0.860
Missing + Typos	389	68.12%	88.17%	94.34%	96.92%	98.46%	0.739	0.682	0.731	0.805

95% confidence intervals (bootstrap from summary files):

Queries	Hit@1 95% CI	Hit@10 95% CI	MRR@10 95% CI	nDCG@10 95% CI
Normal	[95.37%, 98.97%]	[98.71%, 100.00%]	[0.971, 0.994]	[0.939, 0.968]
Typos	[84.83%, 91.77%]	[92.80%, 96.66%]	[0.881, 0.935]	[0.847, 0.902]
Missing Attribute	[70.69%, 79.18%]	[89.97%, 94.99%]	[0.766, 0.835]	[0.759, 0.824]
Missing + Typos	[63.36%, 72.49%]	[84.95%, 91.14%]	[0.695, 0.778]	[0.690, 0.767]

Query set definitions

The four query files test robustness under controlled perturbations.

Queries	Transformation	Hard invariants
Normal	Unchanged query (reference run)	No perturbation
Missing file	Removes one allowed token from `PredefinedType`, `Material`, `StrengthClass`, or `insitu/precast` (`Ortbeton/Fertigteil`)	`IfcEntity` is never removed
Typos file	1 to 2 typos per line, max 1 typo per token/word	`IfcEntity` remains correct
Combined file	First remove one allowed token, then inject 1 to 2 typos into remaining allowed tokens (max 1 typo per token)	`IfcEntity` remains correct

Summary of generated perturbation files:

File	Changed lines	Typo distribution
Missing	388	-
Typos	388	1 typo: 193, 2 typos: 195
Missing + Typos	388	1 typo: 309, 2 typos: 61

Detailed interpretation

Readability note: metrics are computed on 389 evaluation cases; the perturbation table above reports changed lines in the generated query files.

Degradation versus Normal Queries:

Queries	Delta Hit@1	Delta Hit@10	Delta MRR@10	Delta nDCG@10
Typos	-9.00%	-4.63%	-0.075	-0.078
Missing Attribute	-22.11%	-6.69%	-0.181	-0.160
Missing + Typos	-29.31%	-11.32%	-0.245	-0.223

Conclusion: token removal hurts more than pure typo noise; the combined perturbation is strongest, as expected.

Typos vs. Missing (direct comparison):

Hit@1: Missing is 13.11 percentage points below Typos (75.32% vs 88.43%).
Hit@10: Missing is 2.06 percentage points below Typos (92.80% vs 94.86%).
MRR@10: Missing is 0.106 below Typos (0.803 vs 0.909).
nDCG@10: Missing is 0.082 below Typos (0.794 vs 0.876).

Conclusion: missing semantic slots move correct results further down the ranking than typos.

Top-1 vs Top-10 recovery potential:

Normal: Hit@10 - Hit@1 = 2.06%.
Typos: Hit@10 - Hit@1 = 6.43%.
Missing Attribute: Hit@10 - Hit@1 = 17.48%.
Missing + Typos: Hit@10 - Hit@1 = 20.05%.

Conclusion: under perturbation, the correct material often remains in top-10 but drops from rank 1 more frequently.

Statistical separability (Hit@1 CIs):

Normal vs Typos: no overlap; interval gap 3.60% (95.37% vs 91.77%).
Typos vs Missing: no overlap; interval gap 5.65% (84.83% vs 79.18%).
Missing vs Missing + Typos: overlap 1.80% (70.69% to 72.49%).

Conclusion: the first two degradation steps are clearly separated; the final step is smaller but still negative.

Practical implications:

High automation precision depends strongly on stable Material, StrengthClass, and CastingMethod slots.
For noisy IFC text, UI workflows should prioritize top-10 candidates and avoid relying on top-1 alone.
Main improvement lever is robust semantic token extraction/preservation, more than additional typo tolerance.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = [
  "IfcPile BORED Stahlbeton C40/50 500 INSITU",
  "Tiefgründung Ortbetonbohrpfahl 700",
]

model = SentenceTransformer("Hygroskopisch/bge-m3-ifc-kbob-finetuned")
embeddings = model.encode(sentences)
print(embeddings)

Load a fixed released revision:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
  "Hygroskopisch/bge-m3-ifc-kbob-finetuned",
  revision="v3",
)

Training

Core training configuration (v3):

Epochs: 2
Batch size: 32
Learning rate: 2e-05
Warmup ratio: 0.1
FP16: true
Seed: 42
Device: cuda
Prefix mode: no_prefix

DataLoader length: 7418

Loss:

sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss with parameters:

{'scale': 20.0, 'similarity_fct': 'cos_sim'}

fit() parameters:

{
    "epochs": 2,
    "evaluation_steps": 0,
    "evaluator": "__main__.CombinedHit5Mrr10Evaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 1484,
    "weight_decay": 0.01
}

Release Notes

v3 (2026-04-16)

Replaced previous published checkpoint with the new finetuned weights from the latest IFC/KBOB training run.
Updated training data pipeline artifacts and documented exact source file names used for this release.
Published baseline retrieval metrics on 389 evaluation queries (no cross-encoder reranker).
Behavior change: retrieval rankings can differ from previous versions; if you require reproducibility, pin revision v3.
Responsible-use contact added: sbert-lca@pm.me.

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Citing & Authors

If you use this model in a report or publication, cite the project repository and this Hugging Face model page.

Downloads last month: 223

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for Hygroskopisch/bge-m3-ifc-kbob-finetuned

Base model

BAAI/bge-m3

Finetuned

(429)

this model

Hygroskopisch
/

bge-m3-ifc-kbob-finetuned