MorcuendeA commited on
Commit
90c0077
·
verified ·
1 Parent(s): 4602971

MulderFinders

Browse files
Files changed (5) hide show
  1. README.md +17 -72
  2. config.json +2 -2
  3. model.safetensors +1 -1
  4. tokenizer.json +2 -2
  5. training_args.bin +1 -1
README.md CHANGED
@@ -9,83 +9,30 @@ metrics:
9
  model-index:
10
  - name: MulderFinders
11
  results: []
12
- datasets:
13
- - MorcuendeA/ConspiraText-ES
14
- language:
15
- - es
16
  ---
17
 
18
- ![MulderFinders Logo](./i_want_to_belive.png)
19
-
20
-
21
- # MulderFinders
22
 
23
  # MulderFinders
24
 
25
- The truth is out there... and this model is here to help you find it.
26
-
27
- **MulderFinders** is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), trained on [MorcuendeA/ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES), a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.
28
-
29
- Trust no one... except maybe the F1 score.
30
-
31
-
32
- ## Usage
33
-
34
- You can use the model directly with the 🤗 Transformers library:
35
-
36
-
37
- ```python
38
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
39
- import torch
40
-
41
- model_name = "MorcuendeA/MulderFinders"
42
-
43
- tokenizer = AutoTokenizer.from_pretrained(model_name)
44
- model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
45
-
46
- text = "las redes 5G nos ayudan a tener mejor internet"
47
-
48
- inputs = tokenizer(text, return_tensors="pt")
49
- outputs = model(**inputs)
50
- logits = outputs.logits
51
- probs = torch.softmax(logits, dim=1) [0]
52
- labels = model.config.id2label
53
- pred = torch.argmax(probs).item()
54
- print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
55
-
56
- # Output:
57
- # Prediction: rational (0.9989)
58
- ```
59
-
60
  It achieves the following results on the evaluation set:
61
- - Loss: 0.0004
62
- - Accuracy: 1.0
63
- - F1 Score: 1.0
64
 
65
  ## Model description
66
 
67
- Model description
68
-
69
- **MulderFinders** is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.
70
 
71
  ## Intended uses & limitations
72
 
73
- **Intended uses:**
74
 
75
- - Content moderation on social media or online forums.
76
- - Research and analysis of conspiratorial discourse in Spanish-language texts.
77
- - Assisting fact-checking workflows by flagging potentially conspiratorial statements.
78
-
79
- **Limitations:**
80
-
81
- - May not handle sarcasm, irony, or ambiguous language reliably.
82
- - Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
83
- - May reflect biases present in the training data.
84
-
85
  ## Training and evaluation data
86
 
87
- The model was fine-tuned using the [ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES) dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes.
88
- During fine-tuning, regularization was applied with **attention_dropout** and **hidden_dropout** both set to 0.3.
89
 
90
  ## Training procedure
91
 
@@ -106,18 +53,16 @@ The following hyperparameters were used during training:
106
 
107
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score |
108
  |:-------------:|:------:|:----:|:---------------:|:--------:|:--------:|
109
- | 0.1365 | 0.3030 | 20 | 0.0282 | 0.9924 | 0.9927 |
110
- | 0.0633 | 0.6061 | 40 | 0.1290 | 0.9773 | 0.9774 |
111
- | 0.0362 | 0.9091 | 60 | 0.0390 | 0.9962 | 0.9963 |
112
- | 0.0271 | 1.2121 | 80 | 0.0284 | 0.9962 | 0.9963 |
113
- | 0.0001 | 1.5152 | 100 | 0.0079 | 0.9962 | 0.9963 |
114
- | 0.0026 | 1.8182 | 120 | 0.0322 | 0.9962 | 0.9963 |
115
-
116
 
117
 
118
  ### Framework versions
119
 
120
- - Transformers 4.53.2
121
  - Pytorch 2.6.0+cu124
122
- - Datasets 2.14.4
123
- - Tokenizers 0.21.2
 
9
  model-index:
10
  - name: MulderFinders
11
  results: []
 
 
 
 
12
  ---
13
 
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
 
 
16
 
17
  # MulderFinders
18
 
19
+ This model is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) on an unknown dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.0059
22
+ - Accuracy: 0.9981
23
+ - F1 Score: 0.9983
24
 
25
  ## Model description
26
 
27
+ More information needed
 
 
28
 
29
  ## Intended uses & limitations
30
 
31
+ More information needed
32
 
 
 
 
 
 
 
 
 
 
 
33
  ## Training and evaluation data
34
 
35
+ More information needed
 
36
 
37
  ## Training procedure
38
 
 
53
 
54
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score |
55
  |:-------------:|:------:|:----:|:---------------:|:--------:|:--------:|
56
+ | 0.2601 | 0.3030 | 20 | 0.0532 | 0.9848 | 0.9855 |
57
+ | 0.0771 | 0.6061 | 40 | 0.0197 | 0.9981 | 0.9982 |
58
+ | 0.0271 | 0.9091 | 60 | 0.0218 | 0.9981 | 0.9982 |
59
+ | 0.0189 | 1.2121 | 80 | 0.0182 | 0.9943 | 0.9945 |
60
+ | 0.0176 | 1.5152 | 100 | 0.0093 | 0.9962 | 0.9963 |
 
 
61
 
62
 
63
  ### Framework versions
64
 
65
+ - Transformers 4.54.0
66
  - Pytorch 2.6.0+cu124
67
+ - Datasets 4.0.0
68
+ - Tokenizers 0.21.2
config.json CHANGED
@@ -3,7 +3,7 @@
3
  "EuroBertForSequenceClassification"
4
  ],
5
  "attention_bias": false,
6
- "attention_dropout": 0.3,
7
  "auto_map": {
8
  "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
  "AutoModel": "modeling_eurobert.EuroBertModel",
@@ -19,7 +19,7 @@
19
  "eos_token_id": 128001,
20
  "head_dim": 64,
21
  "hidden_act": "silu",
22
- "hidden_dropout": 0.3,
23
  "hidden_size": 768,
24
  "id2label": {
25
  "0": "rational",
 
3
  "EuroBertForSequenceClassification"
4
  ],
5
  "attention_bias": false,
6
+ "attention_dropout": 0.2,
7
  "auto_map": {
8
  "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
  "AutoModel": "modeling_eurobert.EuroBertModel",
 
19
  "eos_token_id": 128001,
20
  "head_dim": 64,
21
  "hidden_act": "silu",
22
+ "hidden_dropout": 0.2,
23
  "hidden_size": 768,
24
  "id2label": {
25
  "0": "rational",
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9c6d569960e50f952ac73dc824edee37878799713de4fee344cbc575b741918e
3
  size 849445112
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57ebd62773f7092cfe0f5bb70bd3d2e849fab7643ab713b10a74dd2799e37f1b
3
  size 849445112
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bf5dc94ee8165749c233582f839e98776e7ad895f506dcea7556d68ba375ab73
3
- size 17210345
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98d4a1d32152d6cedf85b5e88f3b205106dca1fe72aaab34e0ac13c238421069
3
+ size 17210235
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d209ad7a782f8bd52d93c64d8cfe3272215ced7a889639a474cfc3b0b88c0325
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9bf8330025037b42d854a928006d6f6f6e6f07b712e41b90aaf441e4ca29cb5
3
  size 5304