qwen3-4b-deforum-prompt-lora-v7

QLoRA fine-tune of Qwen/Qwen3-4B-Instruct-2507 for generating cinematic video diffusion prompts in the De Forum Art Film aesthetic — chiaroscuro lighting, visible film grain, slow camera movements, atmospheric tension.

Recommended Usage: Ollama

The simplest way to run this model is via Ollama after merging and converting to GGUF.

Modelfile:

FROM ./qwen3-4b-deforum-v7-q8.gguf

SYSTEM """You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic. When given a scene description, output only the cinematic video prompt — no labels, no preamble."""

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<|im_start|>user
Generate a cinematic video prompt for: {{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}<|im_end|>
{{ end }}{{ end }}<|im_start|>assistant
"""

PARAMETER temperature 0.8
PARAMETER top_p 0.9
PARAMETER num_ctx 512
PARAMETER repeat_penalty 1.25
PARAMETER num_predict 120
PARAMETER stop <|im_end|>
PARAMETER stop <|im_start|>
PARAMETER stop <think>

Note: The TEMPLATE is important — it automatically prepends "Generate a cinematic video prompt for: " to every user message, matching the training format.

ollama run qwen3-4b-deforum-prompt:v7 "Sarah at her studio workstation late at night, surrounded by her subversive artwork"
# → Slow dolly in on Sarah's studio at night, chiaroscuro lighting etching her silhouette against a
#   backdrop of subversive artwork. Heavy film grain, the air thick with unspoken rebellion. Her eyes
#   hold a quiet intensity as she works.

Python Usage (PEFT adapter)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Limbicnation/qwen3-4b-deforum-prompt-lora-v7"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic."},
    {"role": "user", "content": "Generate a cinematic video prompt for: abandoned train station at dusk, pigeons in iron rafters"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=120, temperature=0.8, do_sample=True, repetition_penalty=1.25)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Pass enable_thinking=False to disable Qwen3's chain-of-thought mode for direct output.

Example Outputs

Scene	Output
De Forum's boardroom, city far below, storm gathering	Slow dolly through De Forum's glass walls, city lights bleeding into storm clouds. Heavy film grain, chiaroscuro lighting, the tension of the boardroom pressing down as clouds gather overhead.
Abandoned train station at dusk, pigeons in iron rafters	Slow descent through decaying train station at dusk, dust motes dancing in slanting light. Pigeons gather in iron rafters, feathers catching amber. Film grain thick with decay, shadows pool like liquid memory.
Sarah confronting De Forum across a conference table, neither speaking	Slow push from Sarah's perspective across the long conference table, film grain thick with tension. Her gaze locked on De Forum's shadowed face, the space between them a silent battlefield of unspoken consequences.
Wet cobblestones at 3am, single streetlamp, footsteps fading	Slow descent through rain-slicked cobblestones at 3am, a single streetlamp bleeds into the wet stone, footsteps dissolving into the dark, film grain catching the last light as the path vanishes.

Training Details

Dataset

Limbicnation/deforum-prompt-lora-dataset-v7 — 1,547 train / 172 validation rows
Mix of general atmospheric scenes, De Forum Art Film narrative seeds, and cinematic scene descriptions
All responses synthesized as cinematic video prompts (no story text)

Configuration

Setting	Value
Base model	Qwen/Qwen3-4B-Instruct-2507
LoRA rank	32
LoRA alpha	64
Target modules	q/k/v/o_proj + gate/up/down_proj
Learning rate	1e-4
Scheduler	cosine_with_min_lr (min 1e-6)
Batch size	4 × grad_accum 2 = 8 effective
Epochs	5 (best at epoch 2)
Quantization	NF4 + double quant, bf16 compute
Packing	false

Training Results

Epoch	eval_loss	eval_token_acc
1	1.3099	70.1%
2 ← best	1.2113	71.96%
3	1.2231	71.92%
4	1.2732	72.02%
5	1.2931	72.01%

train_loss (epoch 5): 1.1323 — small train/eval gap indicates no overfitting
Best checkpoint (epoch 2) saved via load_best_model_at_end=True
Runtime: ~10 min on RTX 4090, 12 samples/sec
Trainable params: 66M / 4.09B (1.62%)

Framework Versions

TRL: 0.27.1
Transformers: 4.57.6
PyTorch: 2.6.0+cu124
PEFT: 0.15.2
Datasets: 4.5.0

Downloads last month: 269

Model tree for Limbicnation/qwen3-4b-deforum-prompt-lora-v7

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5273)

this model