Prompt Enhancer LLM for Video Generation
Final project for MIT 6.7920: Reinforcement Learning
Model Info
This model translates simple video generation prompts into detailed, professional-grade prompts, eliminating the need for prompt engineering and bringing high-quality AI-generated videos to ordinary users.
Example:
- Original prompt:
"A golden retriever playing in a park, chasing butterflies on a sunny afternoon" - Enhanced prompt:
"# SUNSHINE AND PLEASURE - [Scene: A golden retriever runs through a sunlit park, bounding after fluttering butterflies. Grass sways in gentle breeze.] - Camera: Tracking shot at ground level, capturing playful motion and joyful expression. Cut to wide view showing expanse of green. - Lighting: Bright sunlight with long shadows. Golden hour glow enhances fur texture. - Motion: Dog leaps gracefully, tail wagging wildly. Butterflies dance around nose. - Details: Ribbon on collar; muddy paws. Birds chirp in trees. - Atmosphere: Innocence, freedom, pure happiness. - Style: Whimsical nature documentary. Inspired by Up and Finding Nemo. - Color Grading: Warm tones with high saturation. Add soft focus and subtle vignette. - Sound Design: Puppy barks, laughter, and rustling leaves."
The model was specifically trained to improve Sora videos, but you can try using it with other video generation services like Veo, etc.
Usage
Requirements:
pip install torch==2.1.0 transformers==4.42.4 peft==0.11.1
Sample code to load and use the model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "Qwen/Qwen2.5-14B-Instruct"
adapter_id = "dariakryvosheieva/video-prompt-enhancer"
tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
base = AutoModelForCausalLM.from_pretrained(
base_model_id, device_map="auto", torch_dtype="auto"
)
model = PeftModel.from_pretrained(base, adapter_id).eval()
def format_query(simple_prompt: str) -> str:
instruction_text = (
"Convert the following video generation prompt into a professional-grade prompt that will produce a high quality, aesthetic, and impressive video."
"If the original prompt includes a style specification (such as 'anime', 'pixel', or 'cartoon'), keep it in the converted prompt."
"Output only the converted prompt."
)
return f"{instruction_text}\n\nInput:\n{simple_prompt.strip()}\n\nOutput:\n"
prompt = "a cat riding a skateboard in a park at sunset"
text = format_query(prompt)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.95,
pad_token_id=tokenizer.pad_token_id,
)
print(
tokenizer.decode(out[0][inputs["input_ids"].shape[-1] :], skip_special_tokens=True)
)
Credits & Inspiration
- Sora users @keigo_matsumaru and @kejia for prompt styles
- Jina AI's PromptPerfect - an analogous tool for the text and image modalities
Training Procedure
See the GitHub repo.
- Downloads last month
- 5