DanielDDDS commited on
Commit
b352aaa
·
verified ·
1 Parent(s): 1098908

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -1,4 +1,57 @@
1
  ---
2
  language: he
3
  license: mit
4
- ... (the full content as before) ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: he
3
  license: mit
4
+ tags:
5
+ - token-classification
6
+ - recipe-modification
7
+ - hebrew
8
+ - dictabert
9
+ - crf
10
+ datasets:
11
+ - DanielDDDS/recipe-modifications-v2
12
+ metrics:
13
+ - f1
14
+ pipeline_tag: token-classification
15
+ ---
16
+
17
+ # Hebrew Recipe Modification Extraction – DictaBERT + CRF (P1)
18
+
19
+ This model identifies **recipe modifications** (ingredient substitutions, quantity changes, technique adjustments, additions) in Hebrew YouTube cooking comments.
20
+ It uses a **DictaBERT** encoder followed by a **linear‑chain CRF** (Conditional Random Field) for sequence labeling, and was trained on silver‑labeled data with class weights.
21
+
22
+ ## Model Details
23
+
24
+ | Info | Value |
25
+ |--------------------|-------------------------------------------------|
26
+ | **Architecture** | DictaBERT + CRF |
27
+ | **Tokenizer** | DictaBERT (mBERT‑based, Hebrew vocabulary) |
28
+ | **Labels (BIO)** | `O`, `B-`/`I-SUBSTITUTION`, `B-`/`I-QUANTITY`, `B-`/`I-TECHNIQUE`, `B-`/`I-ADDITION` |
29
+ | **Training data** | [DanielDDDS/recipe-modifications-v2](https://huggingface.co/datasets/DanielDDDS/recipe-modifications-v2) (processed_v2) |
30
+ | **Class weights** | Yes (computed from training set) |
31
+ | **Focal loss** | No (γ=0) |
32
+ | **Dropout** | 0.1 |
33
+ | **Learning rate** | 2e‑5 |
34
+ | **Epochs** | 10 (best model at epoch 9) |
35
+
36
+ ## Performance
37
+
38
+ | Split | Exact Entity F1 | Relaxed F1 | Token F1 |
39
+ |---------|----------------|------------|----------|
40
+ | Gold | 29.2% | 65.6% | 41.8% |
41
+ | Silver | 30.1% | 55.2% | 46.2% |
42
+
43
+ Full evaluation files: `evaluation/gold_results.json` and `evaluation/silver_results.json`.
44
+
45
+ ## How to Use
46
+
47
+ \`\`\`python
48
+ import torch
49
+ from transformers import AutoTokenizer
50
+ from src.models.joint_model import BertCRFModel # from the project repository
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("DanielDDDS/hebrew-recipe-modification-ner")
53
+ model = BertCRFModel.from_pretrained("DanielDDDS/hebrew-recipe-modification-ner")
54
+ \`\`\`
55
+
56
+ > **Note:** The model class `BertCRFModel` is defined in the project source code (`src/models/joint_model.py`).
57
+ > To load this model, you need to have that file in your Python path.