SPLADE (co-condenser-marco) finetuned on merged NQ + PT-BR instruction datasets
This model is a sparse retriever trained with Sentence Transformers' SPLADE stack, using a merged multilingual corpus (English + Portuguese).
Usage
from sentence_transformers import SparseEncoder
sparse_model = SparseEncoder("cnmoro/inference-free-splade-co-condenser-en-ptbr")
sparse_embeddings = sparse_model.encode(["Hello", "World"], show_progress_bar=True)
Base model
Luyu/co-condenser-marco
Training date
- Training completed on April 13, 2026.
Dataset composition
The training corpus was built by row-wise concatenation of:
sentence-transformers/natural-questionscnmoro/GPT4-500k-Augmented-PTBR-Cleancnmoro/WizardVicuna-PTBR-Instruct-Clean
Final merged size:
- Total rows: 869,365
- Train rows: 868,365
- Eval rows: 1,000
- Split seed:
12
Training objective
- Loss:
SpladeLoss(SparseMultipleNegativesRankingLoss) - Document regularizer weight:
0.03 - Query regularizer weight:
0
Core hyperparameters
- Epochs: 3
- Per-device batch size: 32
- Max sequence length: 128
- SPLADE pooling chunk size: 64
- Learning rate:
2e-5 - Warmup ratio:
0.1 - Mixed precision:
fp16=True - Batch sampler:
NO_DUPLICATES - Router mapping:
query -> query,answer -> document
Final training metrics
train_runtime: 5756.8386 strain_steps_per_second: 4.714train_samples_per_second: 150.841train_loss: 0.30475
Model tree for cnmoro/inference-free-splade-co-condenser-en-ptbr
Base model
Luyu/co-condenser-marco