DanielDDDS
/

hebrew-recipe-modification-ner

 ---
 language: he
 license: mit
+tags:
+- token-classification
+- recipe-modification
+- hebrew
+- dictabert
+- crf
+datasets:
+- DanielDDDS/recipe-modifications-v2
+metrics:
+- f1
+pipeline_tag: token-classification
+---
+# Hebrew Recipe Modification Extraction – DictaBERT + CRF (P1)
+This model identifies **recipe modifications** (ingredient substitutions, quantity changes, technique adjustments, additions) in Hebrew YouTube cooking comments.
+It uses a **DictaBERT** encoder followed by a **linear‑chain CRF** (Conditional Random Field) for sequence labeling, and was trained on silver‑labeled data with class weights.
+## Model Details
+| Info               | Value                                           |
+|--------------------|-------------------------------------------------|
+| **Architecture**   | DictaBERT + CRF                                 |
+| **Tokenizer**      | DictaBERT (mBERT‑based, Hebrew vocabulary)      |
+| **Labels (BIO)**   | `O`, `B-`/`I-SUBSTITUTION`, `B-`/`I-QUANTITY`, `B-`/`I-TECHNIQUE`, `B-`/`I-ADDITION` |
+| **Training data**  | [DanielDDDS/recipe-modifications-v2](https://huggingface.co/datasets/DanielDDDS/recipe-modifications-v2) (processed_v2) |
+| **Class weights**  | Yes (computed from training set)                |
+| **Focal loss**     | No (γ=0)                                        |
+| **Dropout**        | 0.1                                             |
+| **Learning rate**  | 2e‑5                                            |
+| **Epochs**         | 10 (best model at epoch 9)                      |
+## Performance
+| Split   | Exact Entity F1 | Relaxed F1 | Token F1 |
+|---------|----------------|------------|----------|
+| Gold    | 29.2%          | 65.6%      | 41.8%    |
+| Silver  | 30.1%          | 55.2%      | 46.2%    |
+Full evaluation files: `evaluation/gold_results.json` and `evaluation/silver_results.json`.
+## How to Use
+\`\`\`python
+import torch
+from transformers import AutoTokenizer
+from src.models.joint_model import BertCRFModel  # from the project repository
+tokenizer = AutoTokenizer.from_pretrained("DanielDDDS/hebrew-recipe-modification-ner")
+model = BertCRFModel.from_pretrained("DanielDDDS/hebrew-recipe-modification-ner")
+\`\`\`
+> **Note:** The model class `BertCRFModel` is defined in the project source code (`src/models/joint_model.py`).
+> To load this model, you need to have that file in your Python path.