llama-3.2-3b-awq-4bit
This repository contains a quantized model artifact produced in the graduation project.
Model Details
- Technique: AWQ
- Quantization: INT4
- Base model: meta-llama/Llama-3.2-3B-Instruct
- Export date: 2026-03-23
Benchmark Summary
| Metric | Original | Quantized |
|---|---|---|
| Disk size (GB) | 5.98 | 2.85 |
| Avg inference time | 29.24 | 3.19 |
| Tokens/sec | 3.42 | 31.31 |
| GPU memory | 4513.00 | 3055.00 |
Comparison Highlights
- Speedup: 9.17x
- Memory reduction: 32.30%
- Disk/model size reduction: 52.30%
Benchmark Notes
- Numbers below are copied from local benchmark_results JSON in this project.
Local Source
- Quantized folder: Advanced-Techniques/AWQ/quantized/llama-awq-4bit
- Benchmark JSON: Advanced-Techniques/AWQ/benchmark_results/awq_benchmark_results.json
Usage
Use the model with the library and runtime that match the quantization technique in this repo.
Limitations
- This model card is auto-generated from project files.
- You should validate quality, safety, and license compatibility before public release.
- Downloads last month
- 19
Model tree for emreyigitozturk/llama-3.2-3b-awq-4bit
Base model
meta-llama/Llama-3.2-3B-Instruct