--- license: mit language: - en - ja --- # Model Card ## Overview Rize is a causal language model for pretraining research and general text generation. It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers. The model is designed for research and experimental development. ## Model Size and Architecture This tiny model has about **4 billion total parameters** and about **1 billion active parameters per token**. Main architecture points: - decoder-only Transformer - 19 hidden layers - hidden size of 1536 - 12 attention heads - 64 routed experts - top-4 expert routing per token - 1 shared expert - vocabulary size of 163,840 - maximum context length of 8,192 tokens ## Intended Use This model is intended for: - language modeling research - evaluation of training settings and architectures - general text generation benchmarks This model is not intended to be used as a source of factual truth or professional advice. ## Training The model is trained with autoregressive next-token prediction on text data. It is developed as a research model and may change across checkpoints, runs, and configurations. ## Capabilities - text continuation - general question answering - instruction-style response generation - multilingual text handling, depending on training data ## Limitations - may generate incorrect or misleading information - may reflect biases in training data - may produce unsafe, harmful, or inappropriate text - performance may vary across languages and domains - not optimized for high-stakes decisions ## Safety and Responsible Use Users should review outputs before any real-world use. The model should not be used on its own for: - medical advice - legal advice - financial advice - safety-critical decisions - sensitive personal decisions Human oversight is required. ## Disclaimer This model is provided for research and experimental purposes only. The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose. Use of this model and its outputs is at the user’s own risk. ## Contact FA Research Team