llama2-7b-owq-int4-fp16
This repository contains a quantized model artifact produced in the graduation project.
Model Details
- Technique: OWQ
- Quantization: Mixed INT4/FP16
- Base model: meta-llama/Llama-3.2-3B-Instruct
- Export date: 2026-03-24
Benchmark Summary
| Metric | Original | Quantized |
|---|---|---|
| Disk size (GB) | 5.98 | 2.41 |
| Avg inference time | N/A | N/A |
| Tokens/sec | N/A | N/A |
| GPU memory | N/A | N/A |
| Perplexity | 4.3407 | 4.5979 |
Comparison Highlights
- Speedup: 1.03x
- Memory reduction: 0.00%
- Disk/model size reduction: 59.78%
Benchmark Notes
- Numbers below are copied from local benchmark_results JSON in this project.
Local Source
- Quantized folder: Advanced-Techniques/OWQ/quantized/llama2-7b-owq
- Benchmark JSON: Advanced-Techniques/OWQ/benchmark_results/owq_benchmark_results.json
Usage
Use the model with the library and runtime that match the quantization technique in this repo.
Limitations
- This model card is auto-generated from project files.
- You should validate quality, safety, and license compatibility before public release.
- Downloads last month
- 27
Model tree for emreyigitozturk/llama2-7b-owq-int4-fp16
Base model
meta-llama/Llama-3.2-3B-Instruct