bchao1
/

foveated_diffusion

foveated-rendering

Model card Files Files and versions

bchao1 commited on 8 days ago

Commit

3b7d8ae

·

verified ·

1 Parent(s): 7517d7b

Update model card

Files changed (1) hide show

README.md +85 -0

README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+license: apache-2.0
+library_name: diffusers
+tags:
+  - lora
+  - diffusion
+  - foveated-rendering
+  - text-to-image
+  - text-to-video
+---
+# Foveated Diffusion
+LoRA weights for [**Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation**](https://bchao1.github.io/foveated-diffusion/). Foveated Diffision is a biologically-inspired diffusion framework that employs spatially adaptive tokenization to concentrate compute on selected regions, achieving up to 4× speedups in image and video synthesis.
+- Project page: https://bchao1.github.io/foveated-diffusion/
+- Paper: https://arxiv.org/abs/2603.23491
+## Repository structure
+```
+foveated_diffusion/
+├── image/
+│   ├── no_fov.safetensors        # finetuned baseline, no foveation conditioning
+│   ├── fov_random.safetensors    # foveation conditioning at random gaze locations
+│   ├── fov_saliency.safetensors  # foveation conditioning driven by saliency
+│   └── fov_bbox.safetensors      # foveation conditioning driven by bounding boxes
+└── video/                        # (coming soon)
+```
+All image checkpoints are rank-32 LoRA adapters saved as `safetensors`.
+## Usage
+The image LoRAs are trained on top of `black-forest-labs/FLUX.2-klein-base-4B` and are loaded into the foveated FLUX.2 pipeline that ships with the [project codebase](https://bchao1.github.io/foveated-diffusion/) (built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)).
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from diffsynth.pipelines.flux2_image import ModelConfig
+from src.diffsynth_fov import Flux2FoveatedImagePipeline
+MODEL_ID = "black-forest-labs/FLUX.2-klein-base-4B"
+pipe = Flux2FoveatedImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id=MODEL_ID, origin_file_pattern="transformer/*.safetensors"),
+        ModelConfig(model_id=MODEL_ID, origin_file_pattern="text_encoder/*.safetensors"),
+        ModelConfig(model_id=MODEL_ID, origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
+    ],
+    tokenizer_config=ModelConfig(model_id=MODEL_ID, origin_file_pattern="tokenizer/"),
+)
+lora_path = hf_hub_download(
+    repo_id="bchao1/foveated_diffusion",
+    filename="image/fov_saliency.safetensors",
+)
+pipe.load_lora(pipe.dit, lora_path)
+```
+Or run the project's `inference.py` directly:
+```bash
+python inference.py \
+    --experiment ours \
+    --lora_checkpoint /path/to/fov_saliency.safetensors
+```
+See the [project page](https://bchao1.github.io/foveated-diffusion/) for the full inference pipeline (gaze handling, foveation transform, decode modes, etc.).
+## Citation
+```bibtex
+@misc{chao2026foveateddiffusion,
+      title={Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation},
+      author={Brian Chao and Lior Yariv and Howard Xiao and Gordon Wetzstein},
+      year={2026},
+      eprint={2603.23491},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2603.23491},
+}
+```