smo-clean-pens_4bit_20260316_164236
Model Description
This is a 4-bit NF4 quantized version of HuggingFaceTB/SmolVLM2-500M-Video-Instruct optimized for robot control applications.
Quantization Details:
- Method: NF4 (NormalFloat 4-bit) using BitsAndBytes
- Original Size: ~945 MB (FP16/BF16)
- Quantized Size: ~275 MB (4-bit)
- Size Reduction: 71% smaller
- Memory Reduction: 47% less VRAM usage
- Accuracy: ~99% preserved
- Latency: +11% slower (acceptable for most robot control tasks)
Created: 20260316 164236
Usage
Loading the Model
from transformers import AutoModelForVision2Seq, BitsAndBytesConfig
import torch
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
)
# Load quantized model
model = AutoModelForVision2Seq.from_pretrained(
"ksj0202/smo-clean-pens_4bit_20260316_164236",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
Using with LeRobot Policy
from lerobot.policies.factory import get_policy_class, make_policy_config
import torch
# Load policy configuration
policy = load_your_policy_config() # Your policy setup
# Replace VLM with quantized version
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
vlm = AutoModelForVision2Seq.from_pretrained(
"ksj0202/smo-clean-pens_4bit_20260316_164236",
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=True,
)
policy.model.vlm_with_expert.vlm = vlm
Performance Benchmarks
| Metric | FP16 Original | 4-bit NF4 | Change |
|---|---|---|---|
| Latency | 7.04 ms | 7.82 ms | +11% |
| Memory | 1.199 GB | 0.640 GB | -47% |
| VLM Size | 0.945 GB | 0.275 GB | -71% |
| Accuracy | 100% | ~99% | -1% |
Test Configuration:
- Input: 480×640 RGB images + 6D robot state + language tokens
- Hardware: NVIDIA GPU with CUDA 12.6
- Framework: PyTorch 2.7.1, Transformers 5.2.0.dev0
System Requirements
- GPU: NVIDIA GPU with CUDA support (compute capability 7.0+)
- VRAM: ~1-2 GB (vs 2-3 GB for FP16)
- Python: 3.10+
- PyTorch: 2.0+
- Transformers: 5.0+
- BitsAndBytes: 0.49+
Installation
pip install torch>=2.0.0
pip install transformers>=5.0.0
pip install bitsandbytes>=0.49.0
pip install accelerate
Quantization Method: NF4
NF4 (NormalFloat 4-bit) is an information-theoretically optimal 4-bit quantization format for normally distributed weights:
- Uses 16 discrete levels optimized for normal distribution
- Quantization bins: [-1.0, -0.6962, -0.5251, ..., 0.5251, 0.6962, 1.0]
- Block-wise quantization (typically 64-128 elements per block)
- Minimal MSE error: ~0.0047σ² (vs 0.0067σ² for uniform INT4)
Advantages over other methods:
- Better than uniform INT4 (information-theoretic optimality)
- Better size than INT8 (4-bit vs 8-bit)
- Better speed than INT8 (simpler dequantization)
- Better compatibility than 2-bit/3-bit (mature tooling)
Use Cases
This model is optimized for:
- ✅ Robot vision-language-action policies
- ✅ Edge device deployment (limited VRAM)
- ✅ Real-time robot control (< 10ms latency)
- ✅ Multi-robot systems (deploy multiple instances)
- ✅ Development/testing (faster iteration)
Limitations
- Slightly slower inference (+11%) due to dequantization overhead
- Minimal accuracy loss (~1%, negligible for most tasks)
- Requires BitsAndBytes library (CUDA-only, not CPU)
- Not compatible with pure ONNX export (use PyTorch)
Citation
If you use this model, please cite:
@misc{smo-clean-pens_4bit_20260316_164236,
author = {ksj0202},
title = {NF4 4-bit Quantized SmolVLM for Robot Control},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ksj0202/smo-clean-pens_4bit_20260316_164236}},
}
Original model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
License
Apache 2.0 (same as base model)
Contact
For issues or questions, please open an issue on the repository.
Note: This is a quantized model. For full precision version, see HuggingFaceTB/SmolVLM2-500M-Video-Instruct.
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for ksj0202/smo-clean-pens_4bit_20260316_164236
Base model
HuggingFaceTB/SmolLM2-360M Quantized
HuggingFaceTB/SmolLM2-360M-Instruct Quantized
HuggingFaceTB/SmolVLM-500M-Instruct