---
license: mit
language:
- en
- ja
---
# Model Card
## Overview
Rize is a causal language model for pretraining research and general text generation. 
It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers. 
The model is designed for research and experimental development. 

## Model Size and Architecture
This tiny model has about **4 billion total parameters** and about **1 billion active parameters per token**.

Main architecture points:
- decoder-only Transformer
- 19 hidden layers
- hidden size of 1536
- 12 attention heads
- 64 routed experts
- top-4 expert routing per token
- 1 shared expert
- vocabulary size of 163,840
- maximum context length of 8,192 tokens

## Intended Use
This model is intended for:
- language modeling research
- evaluation of training settings and architectures
- general text generation benchmarks

This model is not intended to be used as a source of factual truth or professional advice.

## Training
The model is trained with autoregressive next-token prediction on text data. 
It is developed as a research model and may change across checkpoints, runs, and configurations.

## Capabilities
- text continuation
- general question answering
- instruction-style response generation
- multilingual text handling, depending on training data

## Limitations
- may generate incorrect or misleading information
- may reflect biases in training data
- may produce unsafe, harmful, or inappropriate text
- performance may vary across languages and domains
- not optimized for high-stakes decisions

## Safety and Responsible Use
Users should review outputs before any real-world use. 
The model should not be used on its own for:
- medical advice
- legal advice
- financial advice
- safety-critical decisions
- sensitive personal decisions

Human oversight is required.

## Disclaimer
This model is provided for research and experimental purposes only. 
The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose. 
Use of this model and its outputs is at the user’s own risk.

## Contact
FA Research Team