this

SAGE-OSS-40B

SAGE-OSS-40B is an open-source research release from SAGEA β€” a 40B Mixture-of-Experts language model built on the LoopCoder architecture, SAGEA's early research into iterative loop-based reasoning. This model is part of the research lineage that informed the development of the SAGE Actus family.

This is not a production model. It is released for the research community to explore loop reasoning architectures and MoE scaling in open settings.


Model Details

Property Value
Architecture SAGELoopCoder (MoE)
Parameters ~40B
Tensor Type BF16
Context Length 131,072 tokens
Vocab Size 76,800
Hidden Size 5,120
Layers 80
Attention Heads 40 (GQA: 8 KV heads)
Loop Iterations 2
Loop Window Size 64
RoPE Theta 500,000
License Apache 2.0

Architecture: LoopCoder

The LoopCoder architecture introduces iterative reasoning loops at the model level. Rather than a single linear forward pass, the model performs loop_num iterative passes over a sliding window of loop_window_size tokens, allowing it to refine representations before producing output.

This was SAGEA's earlier approach to building reasoning capability directly into the model architecture β€” distinct from chain-of-thought prompting or post-training reasoning techniques.

Key architectural properties:

  • loop_num: 2 β€” two iterative reasoning passes
  • loop_window_size: 64 β€” token window over which looping occurs
  • GQA β€” 40 attention heads, 8 KV heads for efficiency
  • SiLU activations, RMS norm, no attention or MLP bias
  • RoPE with theta 500,000 for long-context stability

Usage

This model uses a custom architecture. You need to load it with trust_remote_code=True.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "sagea-ai/sage-oss-40b"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        eos_token_id=[2, 75864, 75869]
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: Due to the custom SAGELoopCoderForCausalLM architecture, standard pipeline inference may not work out of the box. Use the snippet above directly.


Limitations

  • Research release β€” not instruction-tuned or RLHF aligned
  • Not recommended for production use
  • Evaluated internally; no public benchmark results at release
  • Requires trust_remote_code=True due to custom architecture

Relation to SAGEA Model Families

SAGE-OSS-40B sits outside the named SAGEA product families (VORA, Celer, Actus). It represents an earlier experimental direction and is released as-is for transparency and community research.

Current SAGEA model families:

  • SAGE Celer β€” (low/mid/high) general-purpose models
  • SAGE Actus β€” agentic and domain-specialized models

Citation

@misc{sagea2025sageoss,
  title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
  author={SAGEA},
  year={2025},
  url={https://huggingface.co/sagea-ai/sage-oss-40b}
}

About SAGEA

SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond.

Downloads last month
13
Safetensors
Model size
40B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support