---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:110773
- loss:ContrastiveLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
widget:
- source_sentence: average monthly net wage/salary, employees, by province and occupation
(rupiah), 2018
sentences:
- '[Seri 2000] Laju Pertumbuhan PDB Triwulanan Atas Dasar Harga Konstan 2000 Terhadap
Triwulan Sebelumnya, 2001-2014'
- IHK dan Rata-rata Upah per Bulan Buruh Industri di Bawah Mandor (Supervisor),
2012-2014 (2012=100)
- Rata-rata Upah/Gaji Bersih Sebulan Buruh/Karyawan/Pegawai Menurut Kelompok Umur
dan Lapangan Pekerjaan Utama di 9 Sektor (Rupiah), 2017
- source_sentence: 'data belanja dan konsumsi per orang di jambi, 2020: fokus pada
makanan dan tingkat pengeluaran'
sentences:
- Rata-rata Konsumsi dan Pengeluaran Perkapita Seminggu Menurut Komoditi Makanan
dan Golongan Pengeluaran per Kapita Seminggu di Provinsi Sulawesi Tenggara, 2018-2023
- Rata-rata Pendapatan Bersih Pekerja Bebas Menurut Provinsi dan Pendidikan Tertinggi
yang Ditamatkan (ribu rupiah), 2017
- Rata-rata Konsumsi dan Pengeluaran Perkapita Seminggu Menurut Komoditi Makanan
dan Golongan Pengeluaran per Kapita Seminggu di Provinsi Jawa Timur, 2018-2023
- source_sentence: 'ALIRAN DANA RUPIAH: Q1 2008'
sentences:
- Sistem Neraca Sosial Ekonomi Indonesia Tahun 2022 dalam Format SNA 1968 (65x65)
- Rata-rata Upah/Gaji Bersih Sebulan Buruh/Karyawan/Pegawai Menurut Provinsi dan
Jenis Pekerjaan Utama, 2024
- Impor Besi dan Baja Menurut Negara Asal Utama, 2017-2023
- source_sentence: 'Aliran Wdana Rupiah: Q1 2008'
sentences:
- Ekspor Karet Remah Menurut Negara Tujuan Utama, 2012-2023
- Rata-rata Upah/Gaji Bersih Sebulan Buruh/Karyawan/Pegawai Menurut Kelompok Umur
dan Lapangan Pekerjaan Utama di 17 Sektor (Rupiah), 2018
- Sistem Neraca Sosial Ekonomi Indonesia Tahun 2022 dalam Format SNA 1968 (65x65)
- source_sentence: 'Aliran dana Rupiah: Q1 2008'
sentences:
- Ringkasan Neraca Arus Dana, Triwulan II, 2011*), (Miliar Rupiah)
- Ringkasan Neraca Arus Dana, 2012 (Miliar Rupiah)
- IHK dan Rata-rata Upah per Bulan Buruh Industri di Bawah Mandor (Supervisor),
2012-2014 (2012=100)
datasets:
- yahyaabd/query-pos-neg-doc-pairs-statictable
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- cosine_mcc
model-index:
- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: allstats semantic mini v1 test
type: allstats-semantic-mini-v1_test
metrics:
- type: cosine_accuracy
value: 0.9678628590683177
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.7482147812843323
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.9677936769237264
name: Cosine F1
- type: cosine_f1_threshold
value: 0.7444144487380981
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.9595714405290031
name: Cosine Precision
- type: cosine_recall
value: 0.976158038147139
name: Cosine Recall
- type: cosine_ap
value: 0.9921512853632306
name: Cosine Ap
- type: cosine_mcc
value: 0.9358669477790009
name: Cosine Mcc
- task:
type: binary-classification
name: Binary Classification
dataset:
name: allstats semantic mini v1 dev
type: allstats-semantic-mini-v1_dev
metrics:
- type: cosine_accuracy
value: 0.9678491772924294
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.7902499437332153
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.9673587968896863
name: Cosine F1
- type: cosine_f1_threshold
value: 0.7874833345413208
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.9616887529731566
name: Cosine Precision
- type: cosine_recall
value: 0.9730960976448341
name: Cosine Recall
- type: cosine_ap
value: 0.9930288231258318
name: Cosine Ap
- type: cosine_mcc
value: 0.9357491510325107
name: Cosine Mcc
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the [query-pos-neg-doc-pairs-statictable](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Maximum Sequence Length:** 128 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [query-pos-neg-doc-pairs-statictable](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable)
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yahyaabd/allstats-search-miniLM-v1-7")
# Run inference
sentences = [
'Aliran dana Rupiah: Q1 2008',
'IHK dan Rata-rata Upah per Bulan Buruh Industri di Bawah Mandor (Supervisor), 2012-2014 (2012=100)',
'Ringkasan Neraca Arus Dana, 2012 (Miliar Rupiah)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Binary Classification
* Datasets: `allstats-semantic-mini-v1_test` and `allstats-semantic-mini-v1_dev`
* Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
| Metric | allstats-semantic-mini-v1_test | allstats-semantic-mini-v1_dev |
|:--------------------------|:-------------------------------|:------------------------------|
| cosine_accuracy | 0.9679 | 0.9678 |
| cosine_accuracy_threshold | 0.7482 | 0.7902 |
| cosine_f1 | 0.9678 | 0.9674 |
| cosine_f1_threshold | 0.7444 | 0.7875 |
| cosine_precision | 0.9596 | 0.9617 |
| cosine_recall | 0.9762 | 0.9731 |
| **cosine_ap** | **0.9922** | **0.993** |
| cosine_mcc | 0.9359 | 0.9357 |
## Training Details
### Training Dataset
#### query-pos-neg-doc-pairs-statictable
* Dataset: [query-pos-neg-doc-pairs-statictable](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable) at [a31b58d](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable/tree/a31b58d221edcddb16274a04b2fafe56df68801a)
* Size: 110,773 training samples
* Columns: query, doc, and label
* Approximate statistics based on the first 1000 samples:
| | query | doc | label |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details |
Data orang yang naik/turun kapal, di pelabuhan yang dikelola maupun tidak, sekitar 2015 | Tabel Input-Output Indonesia Transaksi Total Atas Dasar Harga Dasar (185 Produk), 2016 (Juta Rupiah) | 0 |
| data orang yang naik/turun kapal, di pelabuhan yang dikelola maupun tidak, sekitar 2015 | Tabel Input-Output Indonesia Transaksi Total Atas Dasar Harga Dasar (185 Produk), 2016 (Juta Rupiah) | 0 |
| DATA ORANG YANG NAIK/TURUN KAPAL, DI PELABUHAN YANG DIKELOLA MAUPUN TIDAK, SEKITAR 2015 | Tabel Input-Output Indonesia Transaksi Total Atas Dasar Harga Dasar (185 Produk), 2016 (Juta Rupiah) | 0 |
* Loss: [ContrastiveLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
```json
{
"distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
"margin": 0.5,
"size_average": true
}
```
### Evaluation Dataset
#### query-pos-neg-doc-pairs-statictable
* Dataset: [query-pos-neg-doc-pairs-statictable](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable) at [a31b58d](https://huggingface.co/datasets/yahyaabd/query-pos-neg-doc-pairs-statictable/tree/a31b58d221edcddb16274a04b2fafe56df68801a)
* Size: 23,763 evaluation samples
* Columns: query, doc, and label
* Approximate statistics based on the first 1000 samples:
| | query | doc | label |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details | Cek penghasilan bulanan (gaji bersih) buruh/pegawai, per provinsi dan jenis pekerjaannya, 2019 | Rata-rata Pendapatan Bersih Berusaha Sendiri Menurut Provinsi dan Lapangan Pekerjaan Utama, 2021 | 1 |
| cek penghasilan bulanan (gaji bersih) buruh/pegawai, per provinsi dan jenis pekerjaannya, 2019 | Rata-rata Pendapatan Bersih Berusaha Sendiri Menurut Provinsi dan Lapangan Pekerjaan Utama, 2021 | 1 |
| CEK PENGHASILAN BULANAN (GAJI BERSIH) BURUH/PEGAWAI, PER PROVINSI DAN JENIS PEKERJAANNYA, 2019 | Rata-rata Pendapatan Bersih Berusaha Sendiri Menurut Provinsi dan Lapangan Pekerjaan Utama, 2021 | 1 |
* Loss: [ContrastiveLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
```json
{
"distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
"margin": 0.5,
"size_average": true
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 64
- `per_device_eval_batch_size`: 64
- `num_train_epochs`: 1
- `warmup_ratio`: 0.2
- `fp16`: True
- `load_best_model_at_end`: True
- `eval_on_start`: True
#### All Hyperparameters