Support this work: donate.sybilsolutions.ai

REAP surfaces: GLM | MiniMax | Qwen | Gemma | Paper | Code | PR17 | Cerebras Collection

NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-AutoRound-W4A16-draft

Draft AutoRound quantization of a Nemotron Super checkpoint.

Base model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Draft status

This is a draft research release. It is published for inspection, reproducibility, and early runtime validation. It should not be treated as a final benchmarked production checkpoint.

How this was produced

We quantized the checkpoint with Intel AutoRound using the W4A16 scheme on the remote 8x RTX 3090 host. This lane is optimized for overnight completion and resumability rather than final accuracy tuning.

Settings used

  • source checkpoint: /mnt/llm_models/nemotron-super-compressions/nemotron_super_merged_long50_short15120_v2/reap_50pct
  • source type: REAP 50% pruned checkpoint
  • quantizer: intel/auto-round 0.10.2
  • scheme: W4A16
  • format: auto_round
  • calibration dataset: NeelNanda/pile-10k
  • device_map: auto
  • nsamples: 128
  • iters: 50
  • seqlen: 1024
  • batch_size: 2
  • nblocks: 1
  • low_gpu_mem_usage: True
  • output dir: /home/ser/nemotron-super/autoround_w4a16/reap_50pct

Notes

  • upstream provenance is preserved through the base model link above
  • this repo is intentionally marked draft while quantization/runtime validation is still in progress
  • donation link added per maintainer request

Support

If this work is useful, support Sybil Solutions here: https://donate.sybilsolutions.ai

Support and links

Sponsors

Thank you for the kind sponsors, wouldn't be possible without them:

  • Nvidia
  • TNG Technology
  • Lambda
  • Prime Intellect
  • HotAisle
Downloads last month
226
Safetensors
Model size
6B params
Tensor type
F32
·
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-AutoRound-W4A16-draft

Quantized
(42)
this model

Paper for 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-AutoRound-W4A16-draft