Wezenite
/

gemma-2b-trader-8bit

Text Generation

8-bit precision

Model card Files Files and versions

gemma-2b-trader-8bit / README.md

Wezenite's picture

Upload folder using huggingface_hub

e3b5b8d verified 5 months ago

|

history blame contribute delete

1.24 kB

	---
	license: gemma
	base_model: google/gemma-2-2b
	tags:
	- fine-tuned
	- trading
	- financial
	- summarization
	- quantized
	- 8bit
	pipeline_tag: text-generation
	---

	# Gemma-2-2B Trading Summarizer (8-bit Quantized)

	## Model Description
	This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer.
	It offers ~50% reduction in model size and memory usage with minimal quality loss.

	## Quantization Details
	- Method: bitsandbytes 8-bit quantization
	- Original Precision: fp16
	- Quantized Precision: int8
	- Size Reduction: ~50%
	- Quality Impact: Typically <2% degradation

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"./gemma-2b-trader-8bit",
	load_in_8bit=True,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")

	# Same usage as fp16 version
	```

	## When to Use This Version
	- Limited GPU memory (<8GB VRAM)
	- Faster loading times needed
	- Deployment on edge devices
	- When inference speed is more important than marginal quality

	## When to Use FP16 Version
	- Maximum quality required
	- Sufficient GPU memory available
	- Fine-tuning or further training needed