RubiRLM-1B-Base / README.md
DevHunterAI's picture
Update README.md
de88bd7 verified
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - rubirlm
  - causal-lm
  - base-model
  - text-generation
  - 1b
  - moe
datasets:
  - HuggingFaceFW/fineweb
  - HuggingFaceH4/ultrachat_200k
pipeline_tag: text-generation

RubiRLM-1B-Base

RubiRLM-1B-Base is a 1B-parameter base language model released by DevHunterAI.

Model size: 1B parameters

Training datasets: FineWeb, UltraChat-200k

Model type: Base / pretrained language model

Important: This release is a base model. It can be used for prompt-based generation and experimental chat-style interaction, but it is not an instruction-tuned chat assistant.

Architecture

RubiRLM 1B Architecture

RubiRLM 1B uses a recursive language modeling architecture with recurrent state flow, Mixture-of-Experts routing, and conditional block execution.

Key Features

  • 1B parameters
  • Recursive Language Model (RLM) architecture
  • 10 recursive blocks
  • d_model = 1024
  • 16 attention heads
  • max sequence length = 2048
  • 6 recursive reasoning steps
  • Mixture-of-Experts: 32 experts, top-1 routing
  • Layer skip router for conditional execution
  • Packed execution support
  • Tied token embedding and LM head

Training Data

This model was trained using a mixture of:

  • FineWeb
  • UltraChat-200k

Intended Usage

This model is intended for:

  • base language modeling research
  • continued pretraining
  • experimental prompt-based generation
  • architecture experimentation around recursive and MoE-based language models

Not Intended As

This release should not be treated as:

  • a fully aligned assistant
  • a safety-tuned production chatbot
  • an instruction-following model with guaranteed conversational quality

Loading

Because this repository includes custom model code, loading may require trust_remote_code=True depending on your workflow.

Files

  • pytorch_model.bin: exported RubiRLM weights
  • training_checkpoint.pt: original training checkpoint
  • config.json: Hugging Face-facing config
  • rubirlm_config.json: full RubiRLM architecture config
  • RubiRLM.py: model implementation
  • xqs_moe.py, xqs_stack.py, x_quantum_sparse_ops.py, rubi_train_stack.py: supporting code

Notes

The exported weights were produced from the final training checkpoint and packaged for Hugging Face publication.