Light Fantasy — FLUX.2 Klein Base 4B LoRA

A LoRA fine-tune of FLUX.2-klein-base-4B trained on 232 fantasy paintings. Produces luminous, painterly images with vibrant colors — castles, knights, dragons, enchanted landscapes, and magical atmospheres.

Trained on the undistilled base model for best fine-tuning quality with QLoRA (NF4 4-bit quantization). Runs on consumer GPUs with 16GB VRAM.

Quick Start

import torch
from diffusers import (
    Flux2KleinPipeline,
    Flux2Transformer2DModel,
    BitsAndBytesConfig,
)

# Load transformer with 4-bit quantization (fits 16GB VRAM)
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
transformer = Flux2Transformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.2-klein-base-4B",
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16,
)

# Build pipeline
pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-base-4B",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights("giannisan/light-fantasy-flux2-klein-base-lora")
pipe.enable_model_cpu_offload()

# Fix VAE dtype mismatch (needed with CPU offload + bf16)
_orig_decode = pipe.vae._decode
def _patched_decode(z, *args, **kwargs):
    return _orig_decode(z.to(pipe.vae.dtype), *args, **kwargs)
pipe.vae._decode = _patched_decode

# Generate!
image = pipe(
    prompt="light_fantasy, a detailed fantasy painting of a dragon guarding a crystal cavern filled with gold",
    height=512,
    width=768,
    num_inference_steps=50,
    guidance_scale=3.5,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]
image.save("output.png")

Usage Notes

Trigger Word

Use light_fantasy at the start of your prompt to activate the style:

light_fantasy, a detailed fantasy painting of [your scene description]

Without the trigger word, the base model's default style is used.

Inference Settings

Parameter	Recommended	Notes
`num_inference_steps`	50	Base model (undistilled) needs more steps. 30 is usable, 50 is best quality.
`guidance_scale`	3.5	Standard for FLUX.2 base models.
Resolution	512×512 or 512×768	Trained at 512×512. Landscape (512×768) works great for scenes. 1024×1024 works but takes ~3.7 min.

Generation Speed (RTX 4060 Ti 16GB)

Resolution	Time
512×512	~55 sec
512×768	~1.5 min
1024×1024	~3.7 min

VRAM Requirements

With NF4 quantization + CPU offload, the model runs on 16GB VRAM GPUs. Without quantization, you'll need 24GB+.

Example Prompts

light_fantasy, a detailed fantasy painting of a massive dragon breathing fire on a castle while knights on horseback charge with lances and banners flying

light_fantasy, a detailed fantasy painting of a cozy wizard's library with floating books and a fireplace

light_fantasy, a detailed fantasy painting of an underwater kingdom with coral towers and merfolk

light_fantasy, a detailed fantasy painting of a knight in golden armor riding a silver dragon through the clouds above a medieval kingdom

Training Details

Parameter	Value
Base model	FLUX.2-klein-base-4B (undistilled)
Method	DreamBooth LoRA with QLoRA (NF4 4-bit quantization)
Dataset	giannisan/light-fantasy-dataset — 232 images with per-image BLIP captions
Resolution	512×512
LoRA rank	32
LoRA alpha	32
Learning rate	1e-4 (constant scheduler, 100 warmup steps)
Training steps	2500 (~43 passes per image)
Batch size	1 (gradient accumulation 4, effective batch 4)
Optimizer	AdamW 8-bit
Mixed precision	bf16
Data augmentation	Random horizontal flip
Final loss	0.939
Hardware	NVIDIA RTX 4060 Ti 16GB
Training time	~4 hours

Why the base model?

We chose the undistilled base model over the step-distilled variant because LoRA fine-tuning disrupts step-distillation — a LoRA trained on the distilled model produces blurry images at 4 steps and requires 50 steps anyway. Training on the base model gives cleaner results since it's designed for multi-step inference.

Dataset

Trained on giannisan/light-fantasy-dataset — 232 fantasy paintings auto-captioned with BLIP (Salesforce/blip-image-captioning-large). Each caption starts with the light_fantasy trigger word followed by a scene description.

License

This LoRA inherits the license from the base model. See FLUX.2-klein-base-4B License.

Downloads last month: 19

Model tree for giannisan/light-fantasy-flux2-klein-base-lora

Base model

black-forest-labs/FLUX.2-klein-base-4B

Adapter

(36)

this model