LLaMA 3.7B - Bfloat16

πŸ“š Paper β€’ 🏠 GitHub

This is one of the checkpoints supplementing the paper 1-Bit-Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization. Instructions on how to use the model for inference can be found in the corresponding repository.

⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.

Model Details

  • Architecture: LLaMA
  • Size: 3.7B (3,747,523,584 parameters)

Directory Structure

.
β”œβ”€β”€ config.json                  # HuggingFace model config
β”œβ”€β”€ generation_config.json       # Default generation settings
β”œβ”€β”€ tokenizer.json              # Tokenizer files
└── model.safetensors           # Weights (in Bfloat16)

License

See LICENSE file in the repository.

Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support