sarel commited on
Commit
31fbe01
·
verified ·
1 Parent(s): ff44642

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -25
README.md CHANGED
@@ -18,10 +18,10 @@ pipeline_tag: text-generation
18
 
19
  # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
20
 
21
- HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language[cite: 1]. Developed through a collaboration between **PwC Israel** and **MAFAT**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**[cite: 1].
22
 
23
  ## 🚀 Model Summary
24
- HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks[cite: 1]. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English[cite: 1].
25
 
26
  ---
27
 
@@ -29,51 +29,55 @@ HEBATRON is designed to handle the structural and morphological complexities of
29
 
30
  | Feature | Specification |
31
  | :--- | :--- |
32
- | **Model Name** | HEBATRON[cite: 1] |
33
- | **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE**[cite: 1] |
34
- | **Total Parameters** | 30B[cite: 1] |
35
- | **Active Parameters** | ~3B per token[cite: 1] |
36
- | **Context Window** | 65,536 (64k) tokens[cite: 1] |
37
- | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs[cite: 1] |
38
- | **Precision** | FP8 Mixed-Precision[cite: 1] |
39
 
40
  ---
41
 
42
  ## 🧬 Training Curriculum
43
- The model was trained using a three-phase **Curriculum Learning** strategy[cite: 1]:
44
 
45
  1. **Phase 1: Formal Foundation (75.5B tokens)**
46
- Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules[cite: 1].
47
  2. **Phase 2: Colloquial Expansion (3.36B tokens)**
48
- Integration of social media, forums, and informal web data to handle slang and modern registers[cite: 1].
49
  3. **Phase 3: Long-Context Extension (20.4B tokens)**
50
- Fine-tuning on dense, long-form documents to stabilize the 64k context window[cite: 1].
51
 
52
- > **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset[cite: 1].
53
 
54
  ---
55
 
56
  ## 📊 Performance Evaluation
57
 
58
  ### Hebrew Reasoning Benchmarks
59
- * **SNLI (Semantic Reasoning):** 91.2% accuracy[cite: 1]
60
- * **Israeli Trivia:** 72.1% (+14pt vs base)[cite: 1]
61
- * **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)[cite: 1]
62
- * **GSM8K (Math):** 83.3% accuracy in native Hebrew[cite: 1]
63
 
64
  ### English Reasoning Benchmarks
65
- * **Psychometric Psi (EN):** 91.6%[cite: 1]
66
- * **English Reasoning Average:** 86.0%[cite: 1]
67
 
68
  ---
69
 
70
  ## 🎯 Intended Use & Limitations
71
- * **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning[cite: 1].
72
- * **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model[cite: 1].
73
 
74
  ---
75
 
76
  ## 🤝 Credits
77
- * **Developed by:** PwC Israel & MAFAT[cite: 1]
78
- * **Technical Lead:** Sarel Weinberger (Co-founder, Binatna)[cite: 1]
79
- * **Research Collaborators:** Shaltiel Shmidman (Dicta), Dan Revital (PwC Next)[cite: 1]
 
 
 
 
 
18
 
19
  # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
20
 
21
+ HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel** and **MAFAT**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
22
 
23
  ## 🚀 Model Summary
24
+ HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
25
 
26
  ---
27
 
 
29
 
30
  | Feature | Specification |
31
  | :--- | :--- |
32
+ | **Model Name** | HEBATRON |
33
+ | **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE** |
34
+ | **Total Parameters** | 30B |
35
+ | **Active Parameters** | ~3B per token |
36
+ | **Context Window** | 65,536 (64k) tokens |
37
+ | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
38
+ | **Precision** | FP8 Mixed-Precision |
39
 
40
  ---
41
 
42
  ## 🧬 Training Curriculum
43
+ The model was trained using a three-phase **Curriculum Learning** strategy:
44
 
45
  1. **Phase 1: Formal Foundation (75.5B tokens)**
46
+ Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
47
  2. **Phase 2: Colloquial Expansion (3.36B tokens)**
48
+ Integration of social media, forums, and informal web data to handle slang and modern registers.
49
  3. **Phase 3: Long-Context Extension (20.4B tokens)**
50
+ Fine-tuning on dense, long-form documents to stabilize the 64k context window.
51
 
52
+ > **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset.
53
 
54
  ---
55
 
56
  ## 📊 Performance Evaluation
57
 
58
  ### Hebrew Reasoning Benchmarks
59
+ * **SNLI (Semantic Reasoning):** 91.2% accuracy
60
+ * **Israeli Trivia:** 72.1% (+14pt vs base)
61
+ * **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
62
+ * **GSM8K (Math):** 83.3% accuracy in native Hebrew
63
 
64
  ### English Reasoning Benchmarks
65
+ * **Psychometric Psi (EN):** 91.6%
66
+ * **English Reasoning Average:** 86.0%
67
 
68
  ---
69
 
70
  ## 🎯 Intended Use & Limitations
71
+ * **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
72
+ * **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
73
 
74
  ---
75
 
76
  ## 🤝 Credits
77
+ * **Developed by:** PwC Israel & MAFAT
78
+ * **MAFAT Lead:** Tal Geva [project Lead], Matan Frank
79
+ * **Technical Lead:** Sarel Weinberger (PwC Next)
80
+ * **PwC Israel Team:** Noam Kaiser, Uri Bar Joseph, Smadar Arbatz, Or Levi, Dan Revital, Omer Baruch (PwC Next)
81
+ * **MAFAT Team:** Noam Ordan
82
+ * **Partners:** Amir Nissan Hacohen (Origin.ai)
83
+ * **Research Collaborators:** Shaltiel Shmidman (Dicta)