--- license: gemma base_model: google/gemma-2-2b tags: - fine-tuned - trading - financial - summarization - quantized - 8bit pipeline_tag: text-generation --- # Gemma-2-2B Trading Summarizer (8-bit Quantized) ## Model Description This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. It offers ~50% reduction in model size and memory usage with minimal quality loss. ## Quantization Details - **Method**: bitsandbytes 8-bit quantization - **Original Precision**: fp16 - **Quantized Precision**: int8 - **Size Reduction**: ~50% - **Quality Impact**: Typically <2% degradation ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "./gemma-2b-trader-8bit", load_in_8bit=True, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit") # Same usage as fp16 version ``` ## When to Use This Version - Limited GPU memory (<8GB VRAM) - Faster loading times needed - Deployment on edge devices - When inference speed is more important than marginal quality ## When to Use FP16 Version - Maximum quality required - Sufficient GPU memory available - Fine-tuning or further training needed