qwen3-4b-deforum-prompt-lora-v7
QLoRA fine-tune of Qwen/Qwen3-4B-Instruct-2507 for generating cinematic video diffusion prompts in the De Forum Art Film aesthetic β chiaroscuro lighting, visible film grain, slow camera movements, atmospheric tension.
Recommended Usage: Ollama
The simplest way to run this model is via Ollama after merging and converting to GGUF.
Modelfile:
FROM ./qwen3-4b-deforum-v7-q8.gguf
SYSTEM """You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic. When given a scene description, output only the cinematic video prompt β no labels, no preamble."""
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<|im_start|>user
Generate a cinematic video prompt for: {{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}<|im_end|>
{{ end }}{{ end }}<|im_start|>assistant
"""
PARAMETER temperature 0.8
PARAMETER top_p 0.9
PARAMETER num_ctx 512
PARAMETER repeat_penalty 1.25
PARAMETER num_predict 120
PARAMETER stop <|im_end|>
PARAMETER stop <|im_start|>
PARAMETER stop <think>
Note: The
TEMPLATEis important β it automatically prepends"Generate a cinematic video prompt for: "to every user message, matching the training format.
ollama run qwen3-4b-deforum-prompt:v7 "Sarah at her studio workstation late at night, surrounded by her subversive artwork"
# β Slow dolly in on Sarah's studio at night, chiaroscuro lighting etching her silhouette against a
# backdrop of subversive artwork. Heavy film grain, the air thick with unspoken rebellion. Her eyes
# hold a quiet intensity as she works.
Python Usage (PEFT adapter)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Limbicnation/qwen3-4b-deforum-prompt-lora-v7"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a cinematic video prompt generator specializing in the De Forum Art Film aesthetic."},
{"role": "user", "content": "Generate a cinematic video prompt for: abandoned train station at dusk, pigeons in iron rafters"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=120, temperature=0.8, do_sample=True, repetition_penalty=1.25)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Pass
enable_thinking=Falseto disable Qwen3's chain-of-thought mode for direct output.
Example Outputs
| Scene | Output |
|---|---|
| De Forum's boardroom, city far below, storm gathering | Slow dolly through De Forum's glass walls, city lights bleeding into storm clouds. Heavy film grain, chiaroscuro lighting, the tension of the boardroom pressing down as clouds gather overhead. |
| Abandoned train station at dusk, pigeons in iron rafters | Slow descent through decaying train station at dusk, dust motes dancing in slanting light. Pigeons gather in iron rafters, feathers catching amber. Film grain thick with decay, shadows pool like liquid memory. |
| Sarah confronting De Forum across a conference table, neither speaking | Slow push from Sarah's perspective across the long conference table, film grain thick with tension. Her gaze locked on De Forum's shadowed face, the space between them a silent battlefield of unspoken consequences. |
| Wet cobblestones at 3am, single streetlamp, footsteps fading | Slow descent through rain-slicked cobblestones at 3am, a single streetlamp bleeds into the wet stone, footsteps dissolving into the dark, film grain catching the last light as the path vanishes. |
Training Details
Dataset
- Limbicnation/deforum-prompt-lora-dataset-v7 β 1,547 train / 172 validation rows
- Mix of general atmospheric scenes, De Forum Art Film narrative seeds, and cinematic scene descriptions
- All responses synthesized as cinematic video prompts (no story text)
Configuration
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| Target modules | q/k/v/o_proj + gate/up/down_proj |
| Learning rate | 1e-4 |
| Scheduler | cosine_with_min_lr (min 1e-6) |
| Batch size | 4 Γ grad_accum 2 = 8 effective |
| Epochs | 5 (best at epoch 2) |
| Quantization | NF4 + double quant, bf16 compute |
| Packing | false |
Training Results
| Epoch | eval_loss | eval_token_acc |
|---|---|---|
| 1 | 1.3099 | 70.1% |
| 2 β best | 1.2113 | 71.96% |
| 3 | 1.2231 | 71.92% |
| 4 | 1.2732 | 72.02% |
| 5 | 1.2931 | 72.01% |
- train_loss (epoch 5): 1.1323 β small train/eval gap indicates no overfitting
- Best checkpoint (epoch 2) saved via
load_best_model_at_end=True - Runtime: ~10 min on RTX 4090, 12 samples/sec
- Trainable params: 66M / 4.09B (1.62%)
Framework Versions
- TRL: 0.27.1
- Transformers: 4.57.6
- PyTorch: 2.6.0+cu124
- PEFT: 0.15.2
- Datasets: 4.5.0
- Downloads last month
- 269
Model tree for Limbicnation/qwen3-4b-deforum-prompt-lora-v7
Base model
Qwen/Qwen3-4B-Instruct-2507