SetFit with BAAI/bge-base-en-v1.5
This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
| Label |
Examples |
| 5 |
- '"All of the cited sources do mention her, and enable reliable sourcing of her childhood, education, and acting career. There are reviews of her acting that can be added. However, the reason why I created this article is her play ''How to Load a Musket'' which I believe passes . 3. ""The person has created... (a work that) ha(s) been the primary subject... of multiple independent periodical articles or reviews"'
- 'I concur. The company does not appear to meet and
|
| 6 |
- '"I was the one who put this up for deletion, and I almost want to change my own vote, because of the disregard for the rules which people are performing by excercising such predjudice against this page. Come on people, the problem is that there are no reliable secondary sources as of yet, not that ""evil bloggers"" are trying to rule the world, or that neolgisms must be squooshed without mercy. The issue here is lack of established credible sources because it's way too young to have sources that qualify under "'
- "Um, with all due respect, her widely publicized grassroots-organized ouster is what she is notable for, if being an elected official somehow wasn't enough. Nominator hasn't even looked at the massive amount of supporting media coverage, otherwise he would have known that the contents of the since-removed YouTube video are well documented by
|
| 8 |
|
| 2 |
- 'as Carlos Suarez has pointed out this is an unsourced violation which even if sourced would likely fail our biographical guidelines anyhow. Lose-lose'
- "While he isn't super notable, I don't see why he should be excluded any more than say a no-name backbench MP from one of the major parties should be excluded. Ultimately, he received considerable media coverage as a result of being elected to federal Parliament – and then again when he subsequently lost his spot when the results were declared void. He again ran for election at the special election, and there was coverage on him following his failure to gain a seat at that. It's really more than just
|
| 4 |
- 'unsourced, unverified - could be
|
| 3 |
- 'complete nonsense, meets #G1
|
| 7 |
- '"per nom. Insufficiently notable publication, and no ""''credible, third-party sources with a reputation for fact-checking and accuracy''"" as required by "'
- 'I mentioned the NYT link for purposes, not , as I believe was clear'
- "As it was mentioned, there are remarkable claims being made in the article that need to be
|
| 9 |
- 'as the one who PRODded it; I was tempted to tag its initial incarnation as G11, but I still think it should be deleted per '
- "I don't understand why you're all voting to keep. Don't you know what a son of a bitch is? Isn't policy, or has that been rejected now"
- 'Big ole mess of and '
|
| 1 |
- 'Bot-like nomination made without any
|
| 0 |
|
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("research-dump/bge-base-en-v1.5_wikipedia_policy_wikipedia_policy")
preds = model("fails ")
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Word count |
2 |
38.196 |
433 |
| Label |
Training Sample Count |
| 0 |
23 |
| 1 |
17 |
| 2 |
21 |
| 3 |
17 |
| 4 |
39 |
| 5 |
671 |
| 6 |
60 |
| 7 |
36 |
| 8 |
100 |
| 9 |
16 |
Training Hyperparameters
- batch_size: (8, 2)
- num_epochs: (5, 5)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 10
- body_learning_rate: (1e-05, 1e-05)
- head_learning_rate: 5e-05
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: True
- use_amp: True
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.0004 |
1 |
0.2133 |
- |
| 0.2 |
500 |
0.2428 |
0.2210 |
| 0.4 |
1000 |
0.1484 |
0.1927 |
| 0.6 |
1500 |
0.0528 |
0.1995 |
| 0.8 |
2000 |
0.0335 |
0.2373 |
| 1.0 |
2500 |
0.0346 |
0.2294 |
| 1.2 |
3000 |
0.0267 |
0.2447 |
| 1.4 |
3500 |
0.0239 |
0.2290 |
| 1.6 |
4000 |
0.0253 |
0.2354 |
| 1.8 |
4500 |
0.0219 |
0.2390 |
| 2.0 |
5000 |
0.02 |
0.2335 |
| 2.2 |
5500 |
0.019 |
0.2319 |
| 2.4 |
6000 |
0.0168 |
0.2281 |
| 2.6 |
6500 |
0.0154 |
0.2499 |
| 2.8 |
7000 |
0.013 |
0.2537 |
| 3.0 |
7500 |
0.015 |
0.2408 |
| 3.2 |
8000 |
0.0121 |
0.2423 |
| 3.4 |
8500 |
0.015 |
0.2391 |
| 3.6 |
9000 |
0.0131 |
0.2452 |
| 3.8 |
9500 |
0.0106 |
0.2438 |
| 4.0 |
10000 |
0.0135 |
0.2330 |
| 4.2 |
10500 |
0.0114 |
0.2396 |
| 4.4 |
11000 |
0.0115 |
0.2413 |
| 4.6 |
11500 |
0.0112 |
0.2348 |
| 4.8 |
12000 |
0.0111 |
0.2378 |
| 5.0 |
12500 |
0.013 |
0.2387 |
Framework Versions
- Python: 3.12.7
- SetFit: 1.1.1
- Sentence Transformers: 3.4.1
- Transformers: 4.48.2
- PyTorch: 2.6.0+cu124
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}