File size: 1,240 Bytes
e3b5b8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: gemma
base_model: google/gemma-2-2b
tags:
- fine-tuned
- trading
- financial
- summarization
- quantized
- 8bit
pipeline_tag: text-generation
---

# Gemma-2-2B Trading Summarizer (8-bit Quantized)

## Model Description
This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. 
It offers ~50% reduction in model size and memory usage with minimal quality loss.

## Quantization Details
- **Method**: bitsandbytes 8-bit quantization
- **Original Precision**: fp16
- **Quantized Precision**: int8
- **Size Reduction**: ~50%
- **Quality Impact**: Typically <2% degradation

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./gemma-2b-trader-8bit",
    load_in_8bit=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")

# Same usage as fp16 version
```

## When to Use This Version
- Limited GPU memory (<8GB VRAM)
- Faster loading times needed
- Deployment on edge devices
- When inference speed is more important than marginal quality

## When to Use FP16 Version
- Maximum quality required
- Sufficient GPU memory available
- Fine-tuning or further training needed