Qwen2.5-14B Headline Generator (Distributional RL)
A LoRA adapter for Qwen2.5-14B-Instruct fine-tuned using Distributional Reinforcement Learning for viral headline generation.
Model Description
This model was trained using GRPO with a distributional reward model that predicts the full quantile distribution of engagement scores rather than just the mean. This enables risk-seeking optimization that produces more creative, attention-grabbing headlines.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base_model, "vizopsai/qwen2.5-14b-headline-gen-dist-rl-lora")
Sample Outputs
| Topic | Headline |
|---|---|
| Geek Girls | "Lady Geeks Strike Back with Nerd-Tastic Song: You Don't Define My Geekery" |
| Disability PSA | "Stop the Awkward! People With Disabilities Reveal Hilarious Yet Heartfelt Tips" |
Related
License
Apache 2.0
- Downloads last month
- 3