sarel commited on
Commit
5d9abcc
·
verified ·
1 Parent(s): 6bf4d69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -1,4 +1,79 @@
1
  ---
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
- Model Card: HEBATRONModel SummaryHEBATRON is a high-performance, Hebrew-specialized language model developed by PwC Israel in collaboration with MAFAT. It features a cutting-edge Mamba2 + Mixture-of-Experts (MoE) hybrid architecture, designed to provide superior reasoning capabilities and linear scaling for long-context processing. The model is a localized and enhanced version of the Nemotron-3-Nano-30B framework, optimized specifically for the linguistic complexities of the Hebrew language. Technical SpecificationsModel Name: HEBATRON. Architecture: Hybrid Mamba2 (State Space Model) + Sparse Mixture-of-Experts (MoE). Parameters: 30B total parameters (~3B active parameters per token). Context Window: 65,536 (64k) tokens. Primary Languages: Hebrew and English. Training Infrastructure: NVIDIA Blackwell (B300) and H200 GPUs on AWS EC2 P6/P5 instances, utilizing FP8 mixed-precision. Training Curriculum & StrategyHEBATRON was trained using a three-phase Curriculum Learning strategy to master Hebrew morphology while retaining global reasoning skills: Phase 1: Formal Foundation (~75.5B tokens): Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical and syntactical rules. Phase 2: Colloquial Expansion (~3.36B tokens): Integration of social media, forums, and informal web data to handle slang and modern Hebrew registers. Phase 3: Long-Context Extension (~20.4B tokens): Fine-tuning on dense, long-form documents to stabilize the Mamba2 and MoE routing mechanisms across its 64k context window. Supervised Fine-Tuning (SFT): Alignment on 2 million samples, including localized knowledge distillation and a specialized "Hebrew IFEval" dataset for strict instructional adherence. Performance EvaluationHEBATRON sets a new benchmark for sovereign Hebrew language models, particularly in logical reasoning and cultural knowledge. Hebrew BenchmarksSNLI (Semantic Reasoning): 91.2% accuracy. Israeli Trivia: 72.1% (a 14-point increase over the base model). Hebrew Average Reasoning: 73.8%, surpassing other major local models like DictaLM-3.0-Thinking. GSM8K (Mathematical Reasoning): 83.3% accuracy in native Hebrew. English BenchmarksThe model successfully avoids catastrophic forgetting, maintaining high proficiency in English-centric tasks: Psychometric Psi (EN): 91.6%. English Reasoning Average: 86.0%. Intended Use & LimitationsIntended Use: Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual (HE/EN) reasoning tasks. Limitations: While the Mamba2+MoE architecture excels at long-context processing, users should verify outputs for factual accuracy as with any LLM. CreditsDeveloped by: PwC Israel & MAFAT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - he
4
+ - en
5
  license: apache-2.0
6
+ library_name: mamba
7
+ tags:
8
+ - mamba2
9
+ - moe
10
+ - hebrew
11
+ - finance
12
+ - legal
13
+ - ssm
14
+ model_name: HEBATRON
15
+ base_model: nvidia/nemotron-3-nano-30b-base
16
+ pipeline_tag: text-generation
17
  ---
18
+
19
+ # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
20
+
21
+ HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language[cite: 1]. Developed through a collaboration between **PwC Israel** and **MAFAT**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**[cite: 1].
22
+
23
+ ## 🚀 Model Summary
24
+ HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks[cite: 1]. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English[cite: 1].
25
+
26
+ ---
27
+
28
+ ## 📂 Technical Specifications
29
+
30
+ | Feature | Specification |
31
+ | :--- | :--- |
32
+ | **Model Name** | HEBATRON[cite: 1] |
33
+ | **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE**[cite: 1] |
34
+ | **Total Parameters** | 30B[cite: 1] |
35
+ | **Active Parameters** | ~3B per token[cite: 1] |
36
+ | **Context Window** | 65,536 (64k) tokens[cite: 1] |
37
+ | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs[cite: 1] |
38
+ | **Precision** | FP8 Mixed-Precision[cite: 1] |
39
+
40
+ ---
41
+
42
+ ## 🧬 Training Curriculum
43
+ The model was trained using a three-phase **Curriculum Learning** strategy[cite: 1]:
44
+
45
+ 1. **Phase 1: Formal Foundation (75.5B tokens)**
46
+ Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules[cite: 1].
47
+ 2. **Phase 2: Colloquial Expansion (3.36B tokens)**
48
+ Integration of social media, forums, and informal web data to handle slang and modern registers[cite: 1].
49
+ 3. **Phase 3: Long-Context Extension (20.4B tokens)**
50
+ Fine-tuning on dense, long-form documents to stabilize the 64k context window[cite: 1].
51
+
52
+ > **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset[cite: 1].
53
+
54
+ ---
55
+
56
+ ## 📊 Performance Evaluation
57
+
58
+ ### Hebrew Reasoning Benchmarks
59
+ * **SNLI (Semantic Reasoning):** 91.2% accuracy[cite: 1]
60
+ * **Israeli Trivia:** 72.1% (+14pt vs base)[cite: 1]
61
+ * **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)[cite: 1]
62
+ * **GSM8K (Math):** 83.3% accuracy in native Hebrew[cite: 1]
63
+
64
+ ### English Reasoning Benchmarks
65
+ * **Psychometric Psi (EN):** 91.6%[cite: 1]
66
+ * **English Reasoning Average:** 86.0%[cite: 1]
67
+
68
+ ---
69
+
70
+ ## 🎯 Intended Use & Limitations
71
+ * **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning[cite: 1].
72
+ * **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model[cite: 1].
73
+
74
+ ---
75
+
76
+ ## 🤝 Credits
77
+ * **Developed by:** PwC Israel & MAFAT[cite: 1]
78
+ * **Technical Lead:** Sarel Weinberger (Co-founder, Binatna)[cite: 1]
79
+ * **Research Collaborators:** Shaltiel Shmidman (Dicta), Dan Revital (PwC Next)[cite: 1]