Update README.md

Browse files

Files changed (1) hide show

README.md +29 -25

README.md CHANGED Viewed

@@ -18,10 +18,10 @@ pipeline_tag: text-generation
 # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
-HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language[cite: 1]. Developed through a collaboration between **PwC Israel** and **MAFAT**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**[cite: 1].
 ## 🚀 Model Summary
-HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks[cite: 1]. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English[cite: 1].
 ---
@@ -29,51 +29,55 @@ HEBATRON is designed to handle the structural and morphological complexities of
 | Feature | Specification |
 | :--- | :--- |
-| **Model Name** | HEBATRON[cite: 1] |
-| **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE**[cite: 1] |
-| **Total Parameters** | 30B[cite: 1] |
-| **Active Parameters** | ~3B per token[cite: 1] |
-| **Context Window** | 65,536 (64k) tokens[cite: 1] |
-| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs[cite: 1] |
-| **Precision** | FP8 Mixed-Precision[cite: 1] |
 ---
 ## 🧬 Training Curriculum
-The model was trained using a three-phase **Curriculum Learning** strategy[cite: 1]:
 1. **Phase 1: Formal Foundation (75.5B tokens)**
-   Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules[cite: 1].
 2. **Phase 2: Colloquial Expansion (3.36B tokens)**
-   Integration of social media, forums, and informal web data to handle slang and modern registers[cite: 1].
 3. **Phase 3: Long-Context Extension (20.4B tokens)**
-   Fine-tuning on dense, long-form documents to stabilize the 64k context window[cite: 1].
-> **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset[cite: 1].
 ---
 ## 📊 Performance Evaluation
 ### Hebrew Reasoning Benchmarks
-* **SNLI (Semantic Reasoning):** 91.2% accuracy[cite: 1]
-* **Israeli Trivia:** 72.1% (+14pt vs base)[cite: 1]
-* **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)[cite: 1]
-* **GSM8K (Math):** 83.3% accuracy in native Hebrew[cite: 1]
 ### English Reasoning Benchmarks
-* **Psychometric Psi (EN):** 91.6%[cite: 1]
-* **English Reasoning Average:** 86.0%[cite: 1]
 ---
 ## 🎯 Intended Use & Limitations
-* **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning[cite: 1].
-* **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model[cite: 1].
 ---
 ## 🤝 Credits
-* **Developed by:** PwC Israel & MAFAT[cite: 1]
-* **Technical Lead:** Sarel Weinberger (Co-founder, Binatna)[cite: 1]
-* **Research Collaborators:** Shaltiel Shmidman (Dicta), Dan Revital (PwC Next)[cite: 1]

 # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
+HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel** and **MAFAT**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
 ## 🚀 Model Summary
+HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
 ---
 | Feature | Specification |
 | :--- | :--- |
+| **Model Name** | HEBATRON |
+| **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE** |
+| **Total Parameters** | 30B |
+| **Active Parameters** | ~3B per token |
+| **Context Window** | 65,536 (64k) tokens |
+| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
+| **Precision** | FP8 Mixed-Precision |
 ---
 ## 🧬 Training Curriculum
+The model was trained using a three-phase **Curriculum Learning** strategy:
 1. **Phase 1: Formal Foundation (75.5B tokens)**
+   Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
 2. **Phase 2: Colloquial Expansion (3.36B tokens)**
+   Integration of social media, forums, and informal web data to handle slang and modern registers.
 3. **Phase 3: Long-Context Extension (20.4B tokens)**
+   Fine-tuning on dense, long-form documents to stabilize the 64k context window.
+> **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset.
 ---
 ## 📊 Performance Evaluation
 ### Hebrew Reasoning Benchmarks
+* **SNLI (Semantic Reasoning):** 91.2% accuracy
+* **Israeli Trivia:** 72.1% (+14pt vs base)
+* **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
+* **GSM8K (Math):** 83.3% accuracy in native Hebrew
 ### English Reasoning Benchmarks
+* **Psychometric Psi (EN):** 91.6%
+* **English Reasoning Average:** 86.0%
 ---
 ## 🎯 Intended Use & Limitations
+* **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
+* **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
 ---
 ## 🤝 Credits
+* **Developed by:** PwC Israel & MAFAT
+* **MAFAT Lead:** Tal Geva [project Lead], Matan Frank
+* **Technical Lead:** Sarel Weinberger (PwC Next)
+* **PwC Israel Team:** Noam Kaiser, Uri Bar Joseph, Smadar Arbatz, Or Levi, Dan Revital, Omer Baruch (PwC Next)
+* **MAFAT Team:** Noam Ordan
+* **Partners:** Amir Nissan Hacohen (Origin.ai)
+* **Research Collaborators:** Shaltiel Shmidman (Dicta)