Wezenite's picture
Upload folder using huggingface_hub
e3b5b8d verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- fine-tuned
- trading
- financial
- summarization
- quantized
- 8bit
pipeline_tag: text-generation
---
# Gemma-2-2B Trading Summarizer (8-bit Quantized)
## Model Description
This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer.
It offers ~50% reduction in model size and memory usage with minimal quality loss.
## Quantization Details
- **Method**: bitsandbytes 8-bit quantization
- **Original Precision**: fp16
- **Quantized Precision**: int8
- **Size Reduction**: ~50%
- **Quality Impact**: Typically <2% degradation
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"./gemma-2b-trader-8bit",
load_in_8bit=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")
# Same usage as fp16 version
```
## When to Use This Version
- Limited GPU memory (<8GB VRAM)
- Faster loading times needed
- Deployment on edge devices
- When inference speed is more important than marginal quality
## When to Use FP16 Version
- Maximum quality required
- Sufficient GPU memory available
- Fine-tuning or further training needed