Support this work: donate.sybilsolutions.ai

REAP surfaces: GLM | MiniMax | Qwen | Gemma | Paper | Code | PR17 | Cerebras Collection

NVIDIA-Nemotron-3-Super-120B-A12B-BF16-AutoRound-W4A16-draft

Draft AutoRound quantization of a Nemotron Super checkpoint.

Base model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Draft status

This is a draft research release. It is published for inspection, reproducibility, and early runtime validation. It should not be treated as a final benchmarked production checkpoint.

How this was produced

We quantized the checkpoint with Intel AutoRound using the W4A16 scheme on the remote 8x RTX 3090 host. This lane is optimized for overnight completion and resumability rather than final accuracy tuning.

Settings used

source checkpoint: /mnt/llm_models/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
source type: original base model
quantizer: intel/auto-round 0.10.2
scheme: W4A16
format: auto_round
calibration dataset: NeelNanda/pile-10k
device_map: auto
nsamples: 128
iters: 50
seqlen: 1024
batch_size: 2
nblocks: 1
low_gpu_mem_usage: True
output dir: /home/ser/nemotron-super/autoround_w4a16/original

Notes

upstream provenance is preserved through the base model link above
this repo is intentionally marked draft while quantization/runtime validation is still in progress
donation link added per maintainer request

Support

If this work is useful, support Sybil Solutions here: https://donate.sybilsolutions.ai

Support and links

Donate: https://donate.sybilsolutions.ai
X: https://x.com/0xsero
GitHub: https://github.com/0xsero

Model tree for 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-AutoRound-W4A16-draft

Base model

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Finetuned

(11)

this model

Paper for 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-AutoRound-W4A16-draft

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Paper • 2510.13999 • Published Oct 15, 2025 • 19

0xSero
/

NVIDIA-Nemotron-3-Super-120B-A12B-BF16-AutoRound-W4A16-draft