Nanbeige4.1-3B-heretic (Quantized)

Description

This model is a 4-bit quantized version of the original heretic-org/Nanbeige4.1-3B-heretic model, optimized for reduced memory usage while maintaining performance.

Quantization Details

Quantization Type: 4-bit
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16
bnb_4bit_quant_storage: uint8
Original Footprint: 7867.27 MB (BFLOAT16)
Quantized Footprint: 3243.05 MB (UINT8)
Memory Reduction: 58.8%

Usage

from transformers import AutoModel, AutoTokenizer

model_name = "Nanbeige4.1-3B-heretic-bnb-4bit-nf4"
model = AutoModel.from_pretrained(
    "manu02/Nanbeige4.1-3B-heretic-bnb-4bit-nf4",
)
tokenizer = AutoTokenizer.from_pretrained("manu02/Nanbeige4.1-3B-heretic-bnb-4bit-nf4", use_fast=True)

Downloads last month: 16

Safetensors

Model size

4B params

Tensor type

F32

BF16

Model tree for manu02/Nanbeige4.1-3B-heretic-bnb-4bit-nf4-dq

Base model

Nanbeige/Nanbeige4-3B-Base

Finetuned

Nanbeige/Nanbeige4.1-3B

Finetuned

heretic-org/Nanbeige4.1-3B-heretic

Quantized

(11)

this model