language:
- en
- de
- fr
- it
- es
- nl
- da
- sv
- 'no'
- pl
license: apache-2.0
tags:
- token-classification
- ner
- product-search
- query-understanding
base_model: bltlab/queryner-bert-base-uncased
datasets:
- bltlab/queryner
- thepian/eco-products-ner-fixtures
pipeline_tag: token-classification
queryner-eco-ner
Named entity recognition for product search queries. Identifies brand, product category, product name, and origin spans in free-text queries.
Fine-tuned from bltlab/queryner-bert-base-uncased, which was trained on Amazon ESCI queries. This model extends it with domain-specific vocabulary drawn from a European product database — brand names, multilingual product titles, and origin countries.
Labels
The model predicts the full 17-type label set from the base queryner model. The four types most relevant to product search are:
| Label | HF tag | Example span |
|---|---|---|
| Brand | B-creator / I-creator |
Ecover, Dr. Bronner's |
| Product category | B-core_product_type / I-core_product_type |
washing up liquid, shampoo |
| Product name | B-product_name / I-product_name |
Skin Food, Men 48H Deodorant |
| Origin | B-origin / I-origin |
Germany, Italy |
All other queryner types (modifier, department, UoM, color, material, etc.) are preserved from the base model.
Usage
from transformers import pipeline
ner = pipeline("token-classification", model="thepian/queryner-eco-ner", aggregation_strategy="simple")
results = ner("Ecover washing up liquid without palm oil")
# [{'entity_group': 'creator', 'word': 'Ecover', ...},
# {'entity_group': 'core_product_type', 'word': 'washing up liquid', ...}]
results = ner("organic olive oil from Italy under €15")
# [{'entity_group': 'core_product_type', 'word': 'olive oil', ...},
# {'entity_group': 'origin', 'word': 'Italy', ...}]
Training data
20,203 examples from three sources:
| Source | Examples | Notes |
|---|---|---|
| bltlab/queryner | 9,140 | Amazon ESCI queries; all 17 label types |
| Local domain fixtures | ~1,063 | Hand-annotated product search queries (incl. substitute-frame fixtures) |
| Synthetic DB fixtures | ~10,000 | Template-generated from brand/category/product vocabulary; includes 1,000 substitute-frame (multilingual) |
Synthetic examples are generated by generate_db_dataset.py from a European product database. Brand names come from EU-registered brands; product names are extracted from all language variants stored in product.name (en, de, fr, it, es, nl, and others). Product names that are exact matches of English category strings are excluded to avoid contradictory training signal.
Label balance and product name vs category
The two most commonly confused labels are core_product_type (product category) and product_name
(specific named product). The model's only reliable cue for distinguishing them is positional:
text following a known brand is a candidate for product_name, while standalone noun phrases are
typically core_product_type. This positional signal is structural, not lexical — "Dove shampoo"
and "Dove Skin Food" look identical to the model at the template level.
Why category dominates in training (~2:1 target)
Real product search queries are category-heavy by a large margin. Most users type "shampoo",
"olive oil", or "washing powder", not "Fuji Green Tea Refreshingly Hydrating Conditioner".
Training data should approximate inference-time distribution; over-representing product_name
creates a mismatch that degrades category precision on the majority of queries.
The base model (bltlab/queryner-bert-base-uncased) was trained on Amazon ESCI queries, which
are also category-heavy. The marginal value of additional core_product_type examples is lower
than the marginal value of product_name examples, but collapsing to 1:1 risks the model
labeling any noun phrase after a brand as product_name — including generic category words like
"shampoo" or "washing up liquid".
Current ratio: ~2.3:1 (core_product_type : product_name). Target: ~2:1.
Why going below 2:1 requires better data, not just more examples
Increasing product_name examples without addressing lexical quality introduces contradictory
signal:
- A product named "Shampoo" and a category called "shampoo" become competing labels for the same string. The model cannot resolve this without knowing whether the token is generic or specific — information that is not present in the query.
- The category cross-reference filter (dropping product names that are exact English category matches) addresses the worst cases, but morphological variants ("Shampoos", "Crème") and multi-language overlaps remain.
To move significantly below 2:1 safely, the product_name training data would need to satisfy:
| Requirement | Why |
|---|---|
| Lexically distinct from category vocabulary | Prevents the model learning a single label for identical strings |
| High word-count names (3+ tokens) | Single and two-token product names are indistinguishable from short category slugs by surface form alone |
| Brand diversity | The positional cue (brand precedes product name) only generalises if many different brands are paired with many different product names — a narrow brand set leads to brand-specific memorisation |
| Multilingual coverage proportional to expected query mix | Training on English product names only means the model will underperform on French/German/Italian queries even though multilingual product names exist in the DB |
| Minimal repetition | A product name seen 20 times with the same brand drowns signal from rarer names |
Until those conditions are met, product_name_ratio should stay at 0.25–0.30 and the 2:1
overall ratio maintained by generating more total synthetic examples rather than increasing the
ratio.
Training procedure
- Base model:
bltlab/queryner-bert-base-uncased - Tokenizer: BERT WordPiece; subword tokens after the first in each word are masked (
-100) - Max sequence length: 128
- Label set: collected from training data (all 17 queryner types preserved)
- Optimiser: AdamW, weight decay 0.01, warmup ratio 0.1
- Segmented training: brand/product/origin first, then certification O-token signal at lower LR
Typical segment configuration:
Segment 1: epochs=3, lr=3e-5 (base → domain)
Segment 2: epochs=2, lr=1e-5 (add cert O-token signal)
Segment 3: epochs=2, lr=5e-6 (product name ratio increase)
Segment 4: epochs=2, lr=5e-6 (substitute-frame + multilingual, brand F1 0.698 → 0.897)
Evaluation
Evaluated on 63 held-out domain fixtures (39 general + 24 substitute-frame / multilingual) with exact and partial span matching.
Segment 4 — 2 epochs, lr=5e-6, base=segment 3 checkpoint, 20,203 training examples (incl. substitute-frame):
| Label | P (partial) | R (partial) | F1 (partial) | F1 (exact) |
|---|---|---|---|---|
| brand | 0.929 | 0.867 | 0.897 | 0.897 |
| product category | 0.895 | 0.962 | 0.927 | 0.891 |
| product name | 0.875 | 0.700 | 0.778 | 0.556 |
| origin | 1.000 | 0.917 | 0.957 | 0.957 |
| overall | 0.915 | 0.900 | 0.908 | 0.874 |
Key remaining gaps:
Dr. Bronner'sapostrophe: tokenizer splits'→ span predicted as"dr. bronner ' s". Needs pre-tokenization normalization.- Ecover brand FN (4 fixtures): underrepresented in training vocabulary; missed even in substitute-frame context.
- German origin
Deutschlandnot recognized — training uses English country names only. - Umlaut span mismatch:
Spülmittellowercased tospulmittelby BERT WordPiece.
Limitations
- Extraction patterns are primarily English; avoidance frames in other languages (
ohne,sans,senza) are not NER targets — they are handled by a separate parser - Multilingual product names are included in training but evaluation is English-only
- Origin recognition covers ~13 European countries drawn from product records; global coverage is partial
- Barcode and price extraction are not NER tasks — handled by a dedicated parser
Citation
If you use this model, please cite the base model:
@misc{queryner,
author = {Björklund, Love and Ljunglöf, Peter},
title = {QueryNER: Named Entity Recognition for Product Search Queries},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/bltlab/queryner-bert-base-uncased}
}