Qwen2.5-14B Headline Generator (Distributional RL)

A LoRA adapter for Qwen2.5-14B-Instruct fine-tuned using Distributional Reinforcement Learning for viral headline generation.

Model Description

This model was trained using GRPO with a distributional reward model that predicts the full quantile distribution of engagement scores rather than just the mean. This enables risk-seeking optimization that produces more creative, attention-grabbing headlines.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base_model, "vizopsai/qwen2.5-14b-headline-gen-dist-rl-lora")

Sample Outputs

Topic	Headline
Geek Girls	"Lady Geeks Strike Back with Nerd-Tastic Song: You Don't Define My Geekery"
Disability PSA	"Stop the Awkward! People With Disabilities Reveal Hilarious Yet Heartfelt Tips"

License

Apache 2.0

Downloads last month: 3

Model tree for vizopsai/qwen2.5-14b-headline-gen-dist-rl-lora

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Adapter

(299)

this model