Qwen3-1.5B-OldGRPO-2086

This is a Qwen3-1.5B model fine-tuned using Group Relative Policy Optimization (GRPO) for context compression tasks.

Model Details

  • Base Model: Qwen3-1.5B-Base
  • Training Method: GRPO (Group Relative Policy Optimization)
  • Checkpoint: 2086 (from bertscore-geometric-mean variant)
  • Training Configuration:
    • Beta: 0.001
    • Learning Rate: 5e-06
    • Batch Size: 2x4
    • Samples: 200,000
    • Epochs: 2

Performance

This checkpoint achieved the best accuracy among all evaluated models:

  • Accuracy: 50.4% (504/1000 exact matches)
  • Baseline (no compression): 49.7% (497/1000 exact matches)
  • Accuracy improvement: +0.7%
  • Average compression ratio: 0.1015 (10.15% compression)
  • Compression rate: 70.5% of samples compressed

Evaluation Details

  • Dataset: triviaqa-llama-memorization (validation split)
  • Evaluation metric: BERTScore
  • Max samples: 1000
  • Temperature: 0.0
  • Prompt length range: 1000-8192 tokens

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "maanas-writer/Qwen3-1.5B-OldGRPO-2086"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Citation

If you use this model, please cite:

@misc{qwen3-1.5b-oldgrpo-2086,
  author = {Maanas Writer},
  title = {Qwen3-1.5B-OldGRPO-2086: A GRPO-trained model for context compression},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/maanas-writer/Qwen3-1.5B-OldGRPO-2086}}
}
Downloads last month
1
Safetensors
Model size
2B params
Tensor type
BF16
·
Video Preview
loading