bchao1 commited on
Commit
3b7d8ae
Β·
verified Β·
1 Parent(s): 7517d7b

Update model card

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ tags:
5
+ - lora
6
+ - diffusion
7
+ - foveated-rendering
8
+ - text-to-image
9
+ - text-to-video
10
+ ---
11
+
12
+ # Foveated Diffusion
13
+
14
+ LoRA weights for [**Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation**](https://bchao1.github.io/foveated-diffusion/). Foveated Diffision is a biologically-inspired diffusion framework that employs spatially adaptive tokenization to concentrate compute on selected regions, achieving up to 4Γ— speedups in image and video synthesis.
15
+
16
+ - Project page: https://bchao1.github.io/foveated-diffusion/
17
+ - Paper: https://arxiv.org/abs/2603.23491
18
+
19
+ ## Repository structure
20
+
21
+ ```
22
+ foveated_diffusion/
23
+ β”œβ”€β”€ image/
24
+ β”‚ β”œβ”€β”€ no_fov.safetensors # finetuned baseline, no foveation conditioning
25
+ β”‚ β”œβ”€β”€ fov_random.safetensors # foveation conditioning at random gaze locations
26
+ β”‚ β”œβ”€β”€ fov_saliency.safetensors # foveation conditioning driven by saliency
27
+ β”‚ └── fov_bbox.safetensors # foveation conditioning driven by bounding boxes
28
+ └── video/ # (coming soon)
29
+ ```
30
+
31
+ All image checkpoints are rank-32 LoRA adapters saved as `safetensors`.
32
+
33
+ ## Usage
34
+
35
+ The image LoRAs are trained on top of `black-forest-labs/FLUX.2-klein-base-4B` and are loaded into the foveated FLUX.2 pipeline that ships with the [project codebase](https://bchao1.github.io/foveated-diffusion/) (built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)).
36
+
37
+ ```python
38
+ import torch
39
+ from huggingface_hub import hf_hub_download
40
+ from diffsynth.pipelines.flux2_image import ModelConfig
41
+ from src.diffsynth_fov import Flux2FoveatedImagePipeline
42
+
43
+ MODEL_ID = "black-forest-labs/FLUX.2-klein-base-4B"
44
+
45
+ pipe = Flux2FoveatedImagePipeline.from_pretrained(
46
+ torch_dtype=torch.bfloat16,
47
+ device="cuda",
48
+ model_configs=[
49
+ ModelConfig(model_id=MODEL_ID, origin_file_pattern="transformer/*.safetensors"),
50
+ ModelConfig(model_id=MODEL_ID, origin_file_pattern="text_encoder/*.safetensors"),
51
+ ModelConfig(model_id=MODEL_ID, origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
52
+ ],
53
+ tokenizer_config=ModelConfig(model_id=MODEL_ID, origin_file_pattern="tokenizer/"),
54
+ )
55
+
56
+ lora_path = hf_hub_download(
57
+ repo_id="bchao1/foveated_diffusion",
58
+ filename="image/fov_saliency.safetensors",
59
+ )
60
+ pipe.load_lora(pipe.dit, lora_path)
61
+ ```
62
+
63
+ Or run the project's `inference.py` directly:
64
+
65
+ ```bash
66
+ python inference.py \
67
+ --experiment ours \
68
+ --lora_checkpoint /path/to/fov_saliency.safetensors
69
+ ```
70
+
71
+ See the [project page](https://bchao1.github.io/foveated-diffusion/) for the full inference pipeline (gaze handling, foveation transform, decode modes, etc.).
72
+
73
+ ## Citation
74
+
75
+ ```bibtex
76
+ @misc{chao2026foveateddiffusion,
77
+ title={Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation},
78
+ author={Brian Chao and Lior Yariv and Howard Xiao and Gordon Wetzstein},
79
+ year={2026},
80
+ eprint={2603.23491},
81
+ archivePrefix={arXiv},
82
+ primaryClass={cs.CV},
83
+ url={https://arxiv.org/abs/2603.23491},
84
+ }
85
+ ```