Qwen3-1.5B-OldGRPO-2086
This is a Qwen3-1.5B model fine-tuned using Group Relative Policy Optimization (GRPO) for context compression tasks.
Model Details
- Base Model: Qwen3-1.5B-Base
- Training Method: GRPO (Group Relative Policy Optimization)
- Checkpoint: 2086 (from bertscore-geometric-mean variant)
- Training Configuration:
- Beta: 0.001
- Learning Rate: 5e-06
- Batch Size: 2x4
- Samples: 200,000
- Epochs: 2
Performance
This checkpoint achieved the best accuracy among all evaluated models:
- Accuracy: 50.4% (504/1000 exact matches)
- Baseline (no compression): 49.7% (497/1000 exact matches)
- Accuracy improvement: +0.7%
- Average compression ratio: 0.1015 (10.15% compression)
- Compression rate: 70.5% of samples compressed
Evaluation Details
- Dataset: triviaqa-llama-memorization (validation split)
- Evaluation metric: BERTScore
- Max samples: 1000
- Temperature: 0.0
- Prompt length range: 1000-8192 tokens
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "maanas-writer/Qwen3-1.5B-OldGRPO-2086"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Citation
If you use this model, please cite:
@misc{qwen3-1.5b-oldgrpo-2086,
author = {Maanas Writer},
title = {Qwen3-1.5B-OldGRPO-2086: A GRPO-trained model for context compression},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/maanas-writer/Qwen3-1.5B-OldGRPO-2086}}
}
- Downloads last month
- 1