RubiRLM-1B-Base / README.md

DevHunterAI

Update README.md

de88bd7 verified 11 days ago

preview code

raw

history blame contribute delete

2.42 kB

metadata

language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - rubirlm
  - causal-lm
  - base-model
  - text-generation
  - 1b
  - moe
datasets:
  - HuggingFaceFW/fineweb
  - HuggingFaceH4/ultrachat_200k
pipeline_tag: text-generation

RubiRLM-1B-Base

RubiRLM-1B-Base is a 1B-parameter base language model released by DevHunterAI.

Model size: 1B parameters

Training datasets: FineWeb, UltraChat-200k

Model type: Base / pretrained language model

Important: This release is a base model. It can be used for prompt-based generation and experimental chat-style interaction, but it is not an instruction-tuned chat assistant.

Architecture

RubiRLM 1B uses a recursive language modeling architecture with recurrent state flow, Mixture-of-Experts routing, and conditional block execution.

Key Features

1B parameters
Recursive Language Model (RLM) architecture
10 recursive blocks
d_model = 1024
16 attention heads
max sequence length = 2048
6 recursive reasoning steps
Mixture-of-Experts: 32 experts, top-1 routing
Layer skip router for conditional execution
Packed execution support
Tied token embedding and LM head

Training Data

This model was trained using a mixture of:

FineWeb
UltraChat-200k

Intended Usage

This model is intended for:

base language modeling research
continued pretraining
experimental prompt-based generation
architecture experimentation around recursive and MoE-based language models

Not Intended As

This release should not be treated as:

a fully aligned assistant
a safety-tuned production chatbot
an instruction-following model with guaranteed conversational quality

Loading

Because this repository includes custom model code, loading may require trust_remote_code=True depending on your workflow.

Files

pytorch_model.bin: exported RubiRLM weights
training_checkpoint.pt: original training checkpoint
config.json: Hugging Face-facing config
rubirlm_config.json: full RubiRLM architecture config
RubiRLM.py: model implementation
xqs_moe.py, xqs_stack.py, x_quantum_sparse_ops.py, rubi_train_stack.py: supporting code

Notes

The exported weights were produced from the final training checkpoint and packaged for Hugging Face publication.