Qwen2.5-14B Headline Generator (Distributional RL)

A LoRA adapter for Qwen2.5-14B-Instruct fine-tuned using Distributional Reinforcement Learning for viral headline generation.

Model Description

This model was trained using GRPO with a distributional reward model that predicts the full quantile distribution of engagement scores rather than just the mean. This enables risk-seeking optimization that produces more creative, attention-grabbing headlines.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base_model, "vizopsai/qwen2.5-14b-headline-gen-dist-rl-lora")

Sample Outputs

Topic Headline
Geek Girls "Lady Geeks Strike Back with Nerd-Tastic Song: You Don't Define My Geekery"
Disability PSA "Stop the Awkward! People With Disabilities Reveal Hilarious Yet Heartfelt Tips"

Related

License

Apache 2.0

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vizopsai/qwen2.5-14b-headline-gen-dist-rl-lora

Base model

Qwen/Qwen2.5-14B
Adapter
(299)
this model