--- language: - vi - en license: apache-2.0 library_name: transformers tags: - moe - mixture-of-experts - text-generation - decode-series - llm - vietnamese-llm datasets: - markov-ai/computer-use-large metrics: - loss - perplexity model-index: - name: Decode-12B-MoE results: [] --- # 🚀 Decode-12B-MoE: High-Performance Mixture of Experts Model **Decode-12B-MoE** is a Large Language Model (LLM) utilizing a **Sparse Mixture of Experts (MoE)** architecture with a total of **12.5 billion parameters**. This model is engineered to bridge the gap between massive parameter counts and computational efficiency, activating only a fraction of its weights (~2.5B) during inference. ** Untrained model! ** ## 📌 Technical Specifications | Attribute | Value | | :--- | :--- | | **Total Parameters** | 12,500,340,736 (12.5B) | | **Active Parameters** | ~2.5B per token | | **Architecture** | Sparse MoE (Decoder-only) | | **Context Window** | 4096 tokens | | **Format** | Bfloat16 / Float16 | | **Training Hardware** | NVIDIA Tesla T4 (Prototyping) / [Your_Main_GPU] | ## 🛠 Training Methodology The model was trained with advanced memory optimization techniques to ensure stability on consumer and enterprise-grade hardware: - **8-bit Optimizer:** Utilized `bitsandbytes` AdamW to reduce optimizer state memory footprint by 75%. - **Gradient Checkpointing:** Enabled to manage activation memory for deep MoE layers. - **Dataset:** Fine-tuned on a diverse corpus of Vietnamese and English text, focusing on reasoning, logic, and natural conversation. ## 💻 Quick Start (Usage) To use this model, ensure you have `transformers` and `accelerate` installed. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Replace with your actual Hugging Face repo ID model_id = "your-username/decode-12b-moe" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True # Required for custom MoE architectures ) # Test Prompt prompt = "Explain the concept of Quantum Computing in simple terms." inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True))