| --- |
| license: gemma |
| base_model: google/gemma-2-2b |
| tags: |
| - fine-tuned |
| - trading |
| - financial |
| - summarization |
| - quantized |
| - 8bit |
| pipeline_tag: text-generation |
| --- |
| |
| # Gemma-2-2B Trading Summarizer (8-bit Quantized) |
|
|
| ## Model Description |
| This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. |
| It offers ~50% reduction in model size and memory usage with minimal quality loss. |
|
|
| ## Quantization Details |
| - **Method**: bitsandbytes 8-bit quantization |
| - **Original Precision**: fp16 |
| - **Quantized Precision**: int8 |
| - **Size Reduction**: ~50% |
| - **Quality Impact**: Typically <2% degradation |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "./gemma-2b-trader-8bit", |
| load_in_8bit=True, |
| device_map="auto" |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit") |
| |
| # Same usage as fp16 version |
| ``` |
|
|
| ## When to Use This Version |
| - Limited GPU memory (<8GB VRAM) |
| - Faster loading times needed |
| - Deployment on edge devices |
| - When inference speed is more important than marginal quality |
|
|
| ## When to Use FP16 Version |
| - Maximum quality required |
| - Sufficient GPU memory available |
| - Fine-tuning or further training needed |
|
|