qwen3-4b-deforum-prompt-lora-v7

QLoRA fine-tune of Qwen/Qwen3-4B-Instruct-2507 for generating cinematic video diffusion prompts in the De Forum Art Film aesthetic β€” chiaroscuro lighting, visible film grain, slow camera movements, atmospheric tension.

Visualize in Weights & Biases

Recommended Usage: Ollama

The simplest way to run this model is via Ollama after merging and converting to GGUF.

Modelfile:

FROM ./qwen3-4b-deforum-v7-q8.gguf

SYSTEM """You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic. When given a scene description, output only the cinematic video prompt β€” no labels, no preamble."""

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<|im_start|>user
Generate a cinematic video prompt for: {{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}<|im_end|>
{{ end }}{{ end }}<|im_start|>assistant
"""

PARAMETER temperature 0.8
PARAMETER top_p 0.9
PARAMETER num_ctx 512
PARAMETER repeat_penalty 1.25
PARAMETER num_predict 120
PARAMETER stop <|im_end|>
PARAMETER stop <|im_start|>
PARAMETER stop <think>

Note: The TEMPLATE is important β€” it automatically prepends "Generate a cinematic video prompt for: " to every user message, matching the training format.

ollama run qwen3-4b-deforum-prompt:v7 "Sarah at her studio workstation late at night, surrounded by her subversive artwork"
# β†’ Slow dolly in on Sarah's studio at night, chiaroscuro lighting etching her silhouette against a
#   backdrop of subversive artwork. Heavy film grain, the air thick with unspoken rebellion. Her eyes
#   hold a quiet intensity as she works.

Python Usage (PEFT adapter)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Limbicnation/qwen3-4b-deforum-prompt-lora-v7"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic."},
    {"role": "user", "content": "Generate a cinematic video prompt for: abandoned train station at dusk, pigeons in iron rafters"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=120, temperature=0.8, do_sample=True, repetition_penalty=1.25)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Pass enable_thinking=False to disable Qwen3's chain-of-thought mode for direct output.

Example Outputs

Scene Output
De Forum's boardroom, city far below, storm gathering Slow dolly through De Forum's glass walls, city lights bleeding into storm clouds. Heavy film grain, chiaroscuro lighting, the tension of the boardroom pressing down as clouds gather overhead.
Abandoned train station at dusk, pigeons in iron rafters Slow descent through decaying train station at dusk, dust motes dancing in slanting light. Pigeons gather in iron rafters, feathers catching amber. Film grain thick with decay, shadows pool like liquid memory.
Sarah confronting De Forum across a conference table, neither speaking Slow push from Sarah's perspective across the long conference table, film grain thick with tension. Her gaze locked on De Forum's shadowed face, the space between them a silent battlefield of unspoken consequences.
Wet cobblestones at 3am, single streetlamp, footsteps fading Slow descent through rain-slicked cobblestones at 3am, a single streetlamp bleeds into the wet stone, footsteps dissolving into the dark, film grain catching the last light as the path vanishes.

Training Details

Dataset

  • Limbicnation/deforum-prompt-lora-dataset-v7 β€” 1,547 train / 172 validation rows
  • Mix of general atmospheric scenes, De Forum Art Film narrative seeds, and cinematic scene descriptions
  • All responses synthesized as cinematic video prompts (no story text)

Configuration

Setting Value
Base model Qwen/Qwen3-4B-Instruct-2507
LoRA rank 32
LoRA alpha 64
Target modules q/k/v/o_proj + gate/up/down_proj
Learning rate 1e-4
Scheduler cosine_with_min_lr (min 1e-6)
Batch size 4 Γ— grad_accum 2 = 8 effective
Epochs 5 (best at epoch 2)
Quantization NF4 + double quant, bf16 compute
Packing false

Training Results

Epoch eval_loss eval_token_acc
1 1.3099 70.1%
2 ← best 1.2113 71.96%
3 1.2231 71.92%
4 1.2732 72.02%
5 1.2931 72.01%
  • train_loss (epoch 5): 1.1323 β€” small train/eval gap indicates no overfitting
  • Best checkpoint (epoch 2) saved via load_best_model_at_end=True
  • Runtime: ~10 min on RTX 4090, 12 samples/sec
  • Trainable params: 66M / 4.09B (1.62%)

Framework Versions

  • TRL: 0.27.1
  • Transformers: 4.57.6
  • PyTorch: 2.6.0+cu124
  • PEFT: 0.15.2
  • Datasets: 4.5.0
Downloads last month
269
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Limbicnation/qwen3-4b-deforum-prompt-lora-v7

Adapter
(5273)
this model