Light Fantasy β€” FLUX.2 Klein Base 4B LoRA

A LoRA fine-tune of FLUX.2-klein-base-4B trained on 232 fantasy paintings. Produces luminous, painterly images with vibrant colors β€” castles, knights, dragons, enchanted landscapes, and magical atmospheres.

Trained on the undistilled base model for best fine-tuning quality with QLoRA (NF4 4-bit quantization). Runs on consumer GPUs with 16GB VRAM.

Quick Start

import torch
from diffusers import (
    Flux2KleinPipeline,
    Flux2Transformer2DModel,
    BitsAndBytesConfig,
)

# Load transformer with 4-bit quantization (fits 16GB VRAM)
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
transformer = Flux2Transformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.2-klein-base-4B",
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16,
)

# Build pipeline
pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-base-4B",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights("giannisan/light-fantasy-flux2-klein-base-lora")
pipe.enable_model_cpu_offload()

# Fix VAE dtype mismatch (needed with CPU offload + bf16)
_orig_decode = pipe.vae._decode
def _patched_decode(z, *args, **kwargs):
    return _orig_decode(z.to(pipe.vae.dtype), *args, **kwargs)
pipe.vae._decode = _patched_decode

# Generate!
image = pipe(
    prompt="light_fantasy, a detailed fantasy painting of a dragon guarding a crystal cavern filled with gold",
    height=512,
    width=768,
    num_inference_steps=50,
    guidance_scale=3.5,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]
image.save("output.png")

Usage Notes

Trigger Word

Use light_fantasy at the start of your prompt to activate the style:

light_fantasy, a detailed fantasy painting of [your scene description]

Without the trigger word, the base model's default style is used.

Inference Settings

Parameter Recommended Notes
num_inference_steps 50 Base model (undistilled) needs more steps. 30 is usable, 50 is best quality.
guidance_scale 3.5 Standard for FLUX.2 base models.
Resolution 512Γ—512 or 512Γ—768 Trained at 512Γ—512. Landscape (512Γ—768) works great for scenes. 1024Γ—1024 works but takes ~3.7 min.

Generation Speed (RTX 4060 Ti 16GB)

Resolution Time
512Γ—512 ~55 sec
512Γ—768 ~1.5 min
1024Γ—1024 ~3.7 min

VRAM Requirements

With NF4 quantization + CPU offload, the model runs on 16GB VRAM GPUs. Without quantization, you'll need 24GB+.

Example Prompts

light_fantasy, a detailed fantasy painting of a massive dragon breathing fire on a castle while knights on horseback charge with lances and banners flying

light_fantasy, a detailed fantasy painting of a cozy wizard's library with floating books and a fireplace

light_fantasy, a detailed fantasy painting of an underwater kingdom with coral towers and merfolk

light_fantasy, a detailed fantasy painting of a knight in golden armor riding a silver dragon through the clouds above a medieval kingdom

Training Details

Parameter Value
Base model FLUX.2-klein-base-4B (undistilled)
Method DreamBooth LoRA with QLoRA (NF4 4-bit quantization)
Dataset giannisan/light-fantasy-dataset β€” 232 images with per-image BLIP captions
Resolution 512Γ—512
LoRA rank 32
LoRA alpha 32
Learning rate 1e-4 (constant scheduler, 100 warmup steps)
Training steps 2500 (~43 passes per image)
Batch size 1 (gradient accumulation 4, effective batch 4)
Optimizer AdamW 8-bit
Mixed precision bf16
Data augmentation Random horizontal flip
Final loss 0.939
Hardware NVIDIA RTX 4060 Ti 16GB
Training time ~4 hours

Why the base model?

We chose the undistilled base model over the step-distilled variant because LoRA fine-tuning disrupts step-distillation β€” a LoRA trained on the distilled model produces blurry images at 4 steps and requires 50 steps anyway. Training on the base model gives cleaner results since it's designed for multi-step inference.

Dataset

Trained on giannisan/light-fantasy-dataset β€” 232 fantasy paintings auto-captioned with BLIP (Salesforce/blip-image-captioning-large). Each caption starts with the light_fantasy trigger word followed by a scene description.

License

This LoRA inherits the license from the base model. See FLUX.2-klein-base-4B License.

Downloads last month
19
Inference Examples
Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for giannisan/light-fantasy-flux2-klein-base-lora

Adapter
(36)
this model