Proprietary Invention Package – Ternary-Quantized Transformer Optimization

Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 22, 2025

Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.

Contents:

  • license.md
  • NDA.md
  • invention_description.md
  • claims.md
  • performance_data.md
  • [Diagrams and attachments]

Confidential: All materials are proprietary. Contact inventor for licensing discussions. JiRack Ternary MoE 405b model

  • JiRack Ternary MoE 405B β€” The Next Evolution in Ultra-Efficient Giant Models
  • Introducing JiRack Ternary MoE 405B: a groundbreaking 405-billion-parameter language model that combines ternary quantization (weights restricted to {-1, 0, +1}) with a powerful Mixture of Experts (MoE) architecture.
  • What makes it special?
  • The JiRack Agentic AI system is elegantly packed directly into the model as a collection of highly specialized experts. Thanks to the MoE design, JiRack Ternary 405B effectively hosts far more experts than a traditional dense model of the same active-parameter footprint β€” unlocking massive capacity while staying dramatically more efficient in compute, memory, and energy use.
  • This hybrid approach draws inspiration from brain-like efficiency (ternary weights mimic ultra-low-precision biological signaling) while delivering top-tier performance through dynamic expert routing. The result: a frontier-scale LLM that's smarter, leaner, and more agentic by design.
  • JiRack Ternary 405B isn't just bigger β€” it's fundamentally more intelligent about how it thinks and scales.

JiRack Ternary MoE 405B β€” Ultra-Efficient Frontier-Scale Intelligence

  • Introducing JiRack Ternary MoE 405B: a revolutionary 405-billion-parameter language model that fuses ternary quantization (weights constrained to {-1, 0, +1} for extreme efficiency) with a powerful Mixture of Experts (MoE) architecture β€” inspired by BitNet-style paradigms and pushing the boundaries of brain-like compute.

  • How JiRack achieves massive scale with unmatched efficiency:

  • Agentic AI Packed as Experts β€” The JiRack Agentic AI system is seamlessly embedded into the model as a dynamic collection of highly specialized experts. The MoE design allows JiRack Ternary 405B to support far more experts than a traditional dense model of equivalent active parameters β€” delivering enormous capacity while slashing compute, memory, and energy demands dramatically.

  • Foundation in Ternary 70B Experts β€” The journey begins with JiRack Ternary 70B, where individual experts are trained separately in a modular, ternary-quantized format. This separable pre-training phase creates highly capable, low-precision specialist modules from the ground up.

  • Expert Router Training β€” Once the experts are ready, we train a dedicated expert router (gating network) to intelligently dispatch each incoming request (token or query) to the most relevant experts. This dynamic routing ensures optimal specialization, load balancing, and efficiency β€” activating only a small subset of the total capacity per inference step.

  • The result? A hybrid architecture that mimics biological neural efficiency (ternary weights β‰ˆ ultra-sparse, low-energy signaling) while unlocking frontier-level performance through smart, adaptive expert selection. JiRack Ternary MoE 405B isn't merely larger β€” it's engineered to think smarter, run leaner, and scale further than conventional dense or even standard MoE designs.

  • Key advantages at a glance:

  • ~70–90% reduction in energy & memory vs. FP16 equivalents

  • Massive effective parameter count via many lightweight ternary experts

  • Agentic behavior baked in through specialized, routable modules

  • Designed for real-world deployment on constrained hardware

  • JiRack is redefining what's possible at 405B scale β€” efficient, intelligent, and truly agentic by design.

  • (Let me know if you'd like this formatted as a blog post, tweet thread, technical abstract, or with more emphasis on benchmarks/training details!)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support