SAGE-OSS-40B

SAGE-OSS-40B is an open-source research release from SAGEA — a 40B Mixture-of-Experts language model built on the LoopCoder architecture, SAGEA's early research into iterative loop-based reasoning. This model is part of the research lineage that informed the development of the SAGE Actus family.

This is not a production model. It is released for the research community to explore loop reasoning architectures and MoE scaling in open settings.

Model Details

Property	Value
Architecture	SAGELoopCoder (MoE)
Parameters	~40B
Tensor Type	BF16
Context Length	131,072 tokens
Vocab Size	76,800
Hidden Size	5,120
Layers	80
Attention Heads	40 (GQA: 8 KV heads)
Loop Iterations	2
Loop Window Size	64
RoPE Theta	500,000
License	Apache 2.0

Architecture: LoopCoder

The LoopCoder architecture introduces iterative reasoning loops at the model level. Rather than a single linear forward pass, the model performs loop_num iterative passes over a sliding window of loop_window_size tokens, allowing it to refine representations before producing output.

This was SAGEA's earlier approach to building reasoning capability directly into the model architecture — distinct from chain-of-thought prompting or post-training reasoning techniques.

Key architectural properties:

loop_num: 2 — two iterative reasoning passes
loop_window_size: 64 — token window over which looping occurs
GQA — 40 attention heads, 8 KV heads for efficiency
SiLU activations, RMS norm, no attention or MLP bias
RoPE with theta 500,000 for long-context stability

Usage

This model uses a custom architecture. You need to load it with trust_remote_code=True.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "sagea-ai/sage-oss-40b"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        eos_token_id=[2, 75864, 75869]
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: Due to the custom SAGELoopCoderForCausalLM architecture, standard pipeline inference may not work out of the box. Use the snippet above directly.

Limitations

Research release — not instruction-tuned or RLHF aligned
Not recommended for production use
Evaluated internally; no public benchmark results at release
Requires trust_remote_code=True due to custom architecture

Relation to SAGEA Model Families

SAGE-OSS-40B sits outside the named SAGEA product families (VORA, Celer, Actus). It represents an earlier experimental direction and is released as-is for transparency and community research.

Current SAGEA model families:

SAGE Celer — (low/mid/high) general-purpose models
SAGE Actus — agentic and domain-specialized models

Citation

@misc{sagea2025sageoss,
  title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
  author={SAGEA},
  year={2025},
  url={https://huggingface.co/sagea-ai/sage-oss-40b}
}

About SAGEA

SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond.

Downloads last month: 13

Safetensors

Model size

40B params

Tensor type

BF16