Enzo8930302 commited on Mar 12

Commit

80b58c8

verified ·

1 Parent(s): f0598d9

Upload folder using huggingface_hub

Browse files

Files changed (22) hide show

.gitignore +73 -0
LICENSE +21 -0
README.md +182 -3
SETUP_GUIDE.md +262 -0
app.py +305 -0
bytedream/__init__.py +21 -0
bytedream/generator.py +317 -0
bytedream/model.py +582 -0
bytedream/pipeline.py +312 -0
bytedream/scheduler.py +273 -0
bytedream/utils.py +398 -0
config.yaml +81 -0
environment.yml +25 -0
examples.py +316 -0
infer.py +150 -0
main.py +278 -0
prepare_dataset.py +287 -0
publish_to_hf.py +30 -0
quick_start.py +124 -0
requirements.txt +16 -0
train.py +500 -0
upload_to_hf.py +420 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,73 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Jupyter Notebook
+.ipynb_checkpoints
+# PyTorch
+*.pth
+*.onnx
+# Model checkpoints
+models/
+checkpoints/
+*.bin
+*.safetensors
+# Outputs
+outputs/
+demo_outputs/
+*.png
+*.jpg
+*.jpeg
+*.webp
+# Logs
+logs/
+*.log
+# OS
+.DS_Store
+Thumbs.db
+desktop.ini
+# Temporary files
+tmp/
+temp/
+*.tmp
+# Hugging Face cache
+.huggingface/
+.cache/

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 Byte Dream
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,3 +1,182 @@
----
-license: mit
----

+# Byte Dream - AI Image Generation Model
+## Overview
+Byte Dream is a robust, production-ready text-to-image diffusion model optimized for CPU inference. This model uses advanced latent diffusion architecture to generate high-quality images from text prompts.
+## Features
+- **CPU Optimized**: Runs efficiently on CPU without GPU requirement
+- **High Quality**: Generates 512x512 and higher resolution images
+- **Fast Inference**: Optimized for speed with quality preservation
+- **Hugging Face Ready**: Easy deployment to Hugging Face Spaces
+- **Flexible**: Supports various sampling methods and customization
+## Installation
+### Using pip
+```bash
+pip install -r requirements.txt
+```
+### Using conda
+```bash
+conda env create -f environment.yml
+conda activate bytedream
+```
+## Usage
+### Basic Image Generation
+```python
+from bytedream import ByteDreamGenerator
+# Initialize generator
+generator = ByteDreamGenerator()
+# Generate image from prompt
+image = generator.generate(
+    prompt="A beautiful sunset over mountains, digital art",
+    num_inference_steps=50,
+    guidance_scale=7.5
+)
+# Save image
+image.save("output.png")
+```
+### Advanced Usage
+```python
+from bytedream import ByteDreamGenerator
+generator = ByteDreamGenerator(model_path="models/bytedream")
+# Generate with custom parameters
+image = generator.generate(
+    prompt="Cyberpunk city at night, neon lights, futuristic",
+    negative_prompt="blurry, low quality, distorted",
+    width=768,
+    height=768,
+    num_inference_steps=100,
+    guidance_scale=9.0,
+    seed=42
+)
+image.save("cyberpunk_city.png")
+```
+### Command Line Interface
+```bash
+# Generate image from command line
+python infer.py --prompt "A dragon flying over castle" --output dragon.png
+# With advanced options
+python infer.py --prompt "Fantasy landscape" --negative "ugly, blurry" --steps 75 --guidance 8.0
+```
+### Gradio Web Interface
+```bash
+python app.py
+```
+## Model Architecture
+Byte Dream uses a latent diffusion model with:
+- **Text Encoder**: CLIP-based text understanding
+- **UNet**: Noise prediction network with cross-attention
+- **VAE**: Variational Autoencoder for image compression
+- **Scheduler**: Advanced DDIM/PNDM sampling
+## Training
+### Prepare Dataset
+```bash
+python prepare_dataset.py --data_dir ./dataset --output_dir ./processed_data
+```
+### Train Model
+```bash
+python train.py \
+  --train_data ./processed_data \
+  --output_dir ./models/bytedream \
+  --epochs 100 \
+  --batch_size 4 \
+  --learning_rate 1e-5
+```
+## Hugging Face Deployment
+### Upload to Hugging Face
+```bash
+python upload_to_hf.py --model_id your_username/bytedream
+```
+### Deploy to Spaces
+1. Create new Space on Hugging Face
+2. Select Gradio SDK
+3. Upload all files
+4. Configure CPU hardware
+5. Deploy automatically
+## Configuration
+Edit `config.yaml` for custom settings:
+- Model dimensions
+- Sampling parameters
+- Training hyperparameters
+- CPU optimization settings
+## Performance Optimization
+### CPU Optimization
+- OpenVINO integration available
+- ONNX runtime support
+- Mixed precision (FP16/FP32)
+- Batch processing
+### Memory Management
+- Gradient checkpointing
+- Model offloading
+- Progressive generation
+## File Structure
+```
+Byte Dream/
+├── bytedream/          # Core package
+│   ├── __init__.py
+│   ├── model.py        # Model architecture
+│   ├── pipeline.py     # Generation pipeline
+│   ├── scheduler.py    # Diffusion scheduler
+│   └── utils.py        # Utilities
+├── train.py            # Training script
+├── infer.py            # Inference script
+├── app.py              # Gradio web interface
+├── config.yaml         # Configuration
+├── requirements.txt    # Dependencies
+└── README.md           # This file
+```
+## Examples
+Generate various types of images:
+- Digital art and illustrations
+- Photorealistic scenes
+- Abstract concepts
+- Character designs
+- Landscapes and environments
+## License
+MIT License - See LICENSE file for details
+## Citation
+If you use Byte Dream in your research:
+```bibtex
+@software{bytedream2024,
+  title={Byte Dream: CPU-Optimized Text-to-Image Generation},
+  year={2024}
+}
+```
+## Support
+For issues and questions, please open a GitHub issue or contact the maintainers.

SETUP_GUIDE.md ADDED Viewed

	@@ -0,0 +1,262 @@

+# Byte Dream - Setup Guide
+## Quick Start (Windows)
+### 1. Install Dependencies
+#### Option A: Using pip (Recommended)
+```cmd
+cd "c:\Users\Enzo\Documents\Byte Dream"
+pip install -r requirements.txt
+```
+#### Option B: Using conda
+```cmd
+cd "c:\Users\Enzo\Documents\Byte Dream"
+conda env create -f environment.yml
+conda activate bytedream
+```
+### 2. Verify Installation
+```cmd
+python quick_start.py
+```
+This will check if all dependencies are installed and test the model.
+### 3. Generate Your First Image
+#### Command Line
+```cmd
+python infer.py --prompt "A beautiful sunset over mountains, digital art" --output sunset.png
+```
+#### Web Interface
+```cmd
+python app.py
+```
+Then open http://localhost:7860 in your browser.
+#### Python Script
+```python
+from bytedream import ByteDreamGenerator
+generator = ByteDreamGenerator()
+image = generator.generate(
+    prompt="A cyberpunk city at night with neon lights",
+    num_inference_steps=50,
+    guidance_scale=7.5
+)
+image.save("cyberpunk_city.png")
+```
+## Model Training
+### Prepare Your Dataset
+1. Collect images in a folder (JPG, PNG formats)
+2. Optionally add .txt files with captions for each image
+3. Run preparation script:
+```cmd
+python prepare_dataset.py --input ./my_images --output ./processed_data --size 512
+```
+### Train the Model
+```cmd
+python train.py --train_data ./processed_data --output_dir ./models/bytedream --epochs 100 --batch_size 4
+```
+Training time depends on:
+- Dataset size
+- Number of epochs
+- CPU speed (expect several hours to days for CPU training)
+## Hugging Face Deployment
+### Upload to Hugging Face Hub
+1. Get your Hugging Face token from https://huggingface.co/settings/tokens
+2. Upload model:
+```cmd
+python upload_to_hf.py --model_path ./models/bytedream --repo_id your_username/bytedream --token YOUR_TOKEN
+```
+### Deploy to Spaces
+1. Create Gradio app file (already included as `app.py`)
+2. Go to https://huggingface.co/spaces
+3. Click "Create new Space"
+4. Choose Gradio SDK
+5. Upload all project files
+6. Select CPU hardware (COSTAR or similar)
+7. Deploy!
+## File Structure
+```
+Byte Dream/
+├── bytedream/              # Core package
+│   ├── __init__.py        # Package initialization
+│   ├── model.py           # Neural network architectures
+│   ├── pipeline.py        # Generation pipeline
+│   ├── scheduler.py       # Diffusion scheduler
+│   ├── generator.py       # Main generator class
+│   └── utils.py           # Utility functions
+├── train.py               # Training script
+├── infer.py               # Command-line inference
+├── app.py                 # Gradio web interface
+├── main.py                # High-level application API
+├── prepare_dataset.py     # Dataset preparation
+├── upload_to_hf.py        # Hugging Face upload
+├── quick_start.py         # Quick start guide
+├── config.yaml            # Configuration
+├── requirements.txt       # Python dependencies
+├── environment.yml        # Conda environment
+├── README.md              # Documentation
+└── LICENSE                # MIT License
+```
+## Usage Examples
+### Basic Generation
+```cmd
+python infer.py -p "A dragon flying over castle" -o dragon.png
+```
+### Advanced Parameters
+```cmd
+python infer.py -p "Fantasy landscape" -n "ugly, blurry" -W 768 -H 768 -s 75 -g 8.0 --seed 42
+```
+### Batch Generation (Python)
+```python
+from bytedream import ByteDreamGenerator
+generator = ByteDreamGenerator()
+prompts = [
+    "Sunset beach, palm trees, tropical paradise",
+    "Mountain landscape, snow peaks, alpine lake",
+    "Forest path, sunlight filtering through trees"
+]
+images = generator.generate_batch(
+    prompts=prompts,
+    width=512,
+    height=512,
+    num_inference_steps=50
+)
+for i, img in enumerate(images):
+    img.save(f"landscape_{i}.png")
+```
+## Performance Optimization
+### CPU Optimization
+The model is already optimized for CPU, but you can:
+1. Increase threads in `config.yaml`:
+```yaml
+cpu_optimization:
+  threads: 8  # Set to number of CPU cores
+  precision: fp32
+```
+2. Use fewer inference steps for faster generation:
+```cmd
+python infer.py -p "Quick preview" -s 20
+```
+3. Generate smaller images:
+```cmd
+python infer.py -p "Small image" -W 256 -H 256
+```
+### Memory Management
+For systems with limited RAM:
+1. Enable memory efficient mode (already default)
+2. Generate one image at a time
+3. Restart Python between batch generations
+## Troubleshooting
+### Import Errors
+If you get import errors:
+```cmd
+pip install --upgrade torch transformers diffusers
+```
+### Memory Errors
+Reduce image size or inference steps:
+```cmd
+python infer.py -p "Test" -W 256 -H 256 -s 20
+```
+### Slow Generation
+CPU generation is slower than GPU. Expect:
+- 256x256: ~30-60 seconds
+- 512x512: ~2-5 minutes
+- 768x768: ~5-10 minutes
+Times vary by CPU speed and number of steps.
+### Model Not Loading
+The model needs trained weights. Either:
+1. Train your own model using `train.py`
+2. Download pretrained weights from Hugging Face
+3. Use Stable Diffusion weights as base
+## Tips for Better Results
+### Writing Prompts
+- Be specific and descriptive
+- Include style references ("digital art", "oil painting")
+- Mention lighting ("dramatic lighting", "soft sunlight")
+- Add quality modifiers ("highly detailed", "4K", "masterpiece")
+### Negative Prompts
+Use to avoid common issues:
+```
+ugly, blurry, low quality, distorted, deformed, bad anatomy, extra limbs
+```
+### Parameters
+- **Steps**: 20-30 (quick), 50 (good), 75-100 (best)
+- **Guidance**: 5-7 (creative), 7-9 (balanced), 9-12 (strict)
+- **Resolution**: Start with 512x512, increase if needed
+## Advanced Features
+### Custom Schedulers
+Edit `config.yaml` to try different schedulers:
+- DDIM (default) - Fast, deterministic
+- EulerDiscrete - Alternative sampling
+### Fine-tuning
+Fine-tune on specific styles:
+1. Collect 50-100 images in desired style
+2. Prepare dataset
+3. Train for 50-100 epochs with low learning rate (1e-6)
+## Support
+For issues and questions:
+1. Check this guide first
+2. Review README.md
+3. Check code comments
+4. Visit Hugging Face documentation
+## Updates
+Check for updates and improvements:
+- New model architectures
+- Better CPU optimization
+- Additional features
+- Bug fixes
+Enjoy creating with Byte Dream! 🎨

app.py ADDED Viewed

	@@ -0,0 +1,305 @@

+"""
+Byte Dream - Gradio Web Interface
+Interactive web UI for image generation
+"""
+import gradio as gr
+from bytedream.generator import ByteDreamGenerator
+import torch
+# Initialize generator
+print("Loading Byte Dream model...")
+try:
+    generator = ByteDreamGenerator(
+        model_path="./models/bytedream",
+        config_path="config.yaml",
+        device="cpu",
+    )
+    print("✓ Model loaded successfully!")
+except Exception as e:
+    print(f"⚠ Warning: Could not load model: {e}")
+    print("  Please train the model or download pretrained weights.")
+    generator = None
+def generate_image(
+    prompt,
+    negative_prompt,
+    width,
+    height,
+    num_steps,
+    guidance_scale,
+    seed,
+):
+    """Generate image from prompt"""
+    if generator is None:
+        return None, "Error: Model not loaded. Please train or download model weights."
+    # Convert seed to None if -1
+    seed_value = None if seed == -1 else seed
+    try:
+        # Generate image
+        image = generator.generate(
+            prompt=prompt,
+            negative_prompt=negative_prompt if negative_prompt else None,
+            width=int(width),
+            height=int(height),
+            num_inference_steps=int(num_steps),
+            guidance_scale=float(guidance_scale),
+            seed=seed_value,
+        )
+        return image, "Success! ✓"
+    except Exception as e:
+        print(f"Error generating image: {e}")
+        import traceback
+        traceback.print_exc()
+        return None, f"Error: {str(e)}"
+# Create Gradio interface
+with gr.Blocks(
+    title="Byte Dream - AI Image Generator",
+    theme=gr.themes.Soft(),
+    css="""
+    .gradio-container {
+        max-width: 1400px !important;
+    }
+    #main-heading {
+        text-align: center;
+        margin-bottom: 20px;
+    }
+    .description {
+        text-align: center;
+        margin-bottom: 30px;
+    }
+    """
+) as demo:
+    gr.Markdown("""
+    # 🎨 Byte Dream - AI Image Generator
+    ### Transform your imagination into reality with advanced AI
+    Powered by state-of-the-art latent diffusion models, optimized for CPU inference.
+    """)
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### 📝 Create Your Prompt")
+            prompt_input = gr.Textbox(
+                label="Positive Prompt",
+                placeholder="Describe the image you want to create...",
+                lines=3,
+                value="A beautiful sunset over mountains, digital art, highly detailed, vibrant colors",
+            )
+            negative_prompt_input = gr.Textbox(
+                label="Negative Prompt (Optional)",
+                placeholder="What to avoid in the image...",
+                lines=2,
+                value="ugly, blurry, low quality, distorted, deformed",
+            )
+            gr.Markdown("### ⚙️ Settings")
+            with gr.Row():
+                width_slider = gr.Slider(
+                    minimum=256,
+                    maximum=1024,
+                    step=64,
+                    value=512,
+                    label="Width (px)",
+                    info="Image width - multiples of 64"
+                )
+                height_slider = gr.Slider(
+                    minimum=256,
+                    maximum=1024,
+                    step=64,
+                    value=512,
+                    label="Height (px)",
+                    info="Image height - multiples of 64"
+                )
+            with gr.Row():
+                steps_slider = gr.Slider(
+                    minimum=10,
+                    maximum=150,
+                    step=5,
+                    value=50,
+                    label="Inference Steps",
+                    info="More steps = better quality but slower"
+                )
+                guidance_slider = gr.Slider(
+                    minimum=1.0,
+                    maximum=20.0,
+                    step=0.5,
+                    value=7.5,
+                    label="Guidance Scale",
+                    info="Higher = closer to prompt, Lower = more creative"
+                )
+            seed_input = gr.Number(
+                label="Seed",
+                value=-1,
+                precision=0,
+                info="-1 for random, any number for reproducibility",
+            )
+            generate_btn = gr.Button(
+                "🎨 Generate Image",
+                variant="primary",
+                size="lg",
+            )
+        with gr.Column(scale=1):
+            gr.Markdown("### 🖼️ Result")
+            output_image = gr.Image(
+                label="Generated Image",
+                type="pil",
+                height=512,
+            )
+            status_text = gr.Textbox(
+                label="Status",
+                interactive=False,
+            )
+            download_btn = gr.File(
+                label="Download",
+                visible=True,
+            )
+    # Tips section
+    with gr.Accordion("💡 Tips for Better Results", open=False):
+        gr.Markdown("""
+        **Writing Effective Prompts:**
+        - Be specific and descriptive
+        - Include art style references (e.g., "digital art", "oil painting", "watercolor")
+        - Mention lighting ("dramatic lighting", "soft sunlight", "neon lights")
+        - Add quality modifiers ("highly detailed", "4K", "masterpiece")
+        - Specify mood and atmosphere ("peaceful", "dramatic", "mysterious")
+        **Using Negative Prompts:**
+        - Remove unwanted elements ("no people", "no text")
+        - Avoid quality issues ("no blur", "no distortion")
+        - Fix common problems ("bad anatomy", "extra limbs")
+        **Parameter Guide:**
+        - **Steps**: 20-30 for quick previews, 50-75 for final images, 100+ for best quality
+        - **Guidance**: 5-7 for creative freedom, 7-9 for balanced, 9-12 for strict prompt following
+        - **Resolution**: Higher = more detail but slower. Start with 512x512, increase if needed
+        """)
+    # Examples section
+    gr.Markdown("### 💡 Example Prompts")
+    with gr.Row():
+        example_btn1 = gr.Button(
+            "🌆 Cyberpunk City",
+            size="sm",
+        )
+        example_btn2 = gr.Button(
+            "🐉 Fantasy Dragon",
+            size="sm",
+        )
+        example_btn3 = gr.Button(
+            "🏔️ Peaceful Landscape",
+            size="sm",
+        )
+    with gr.Row():
+        example_btn4 = gr.Button(
+            "👤 Character Portrait",
+            size="sm",
+        )
+        example_btn5 = gr.Button(
+            "🌊 Underwater Scene",
+            size="sm",
+        )
+        example_btn6 = gr.Button(
+            "🎨 Abstract Art",
+            size="sm",
+        )
+    # Example prompt values
+    example_prompts = {
+        "example_btn1": (
+            "A cyberpunk city at night with neon lights, futuristic architecture, flying cars, rain-slicked streets, highly detailed, digital art, cinematic lighting",
+            "ugly, blurry, low quality, distorted, dark, gloomy"
+        ),
+        "example_btn2": (
+            "A majestic dragon breathing fire, fantasy art, dramatic lighting, epic scene, scales gleaming, powerful wings, mountain landscape background",
+            "ugly, deformed, blurry, low quality, cartoonish"
+        ),
+        "example_btn3": (
+            "A peaceful cottage in a meadow, wildflowers, sunny day, blue sky, studio ghibli style, serene atmosphere, pastoral landscape",
+            "people, animals, buildings, urban, dark, stormy"
+        ),
+        "example_btn4": (
+            "Portrait of a warrior princess, ornate armor, fantasy setting, intricate details, character design, dramatic lighting, confident expression, long flowing hair",
+            "ugly, deformed, asymmetrical, blurry, low quality, bad anatomy"
+        ),
+        "example_btn5": (
+            "Underwater coral reef, tropical fish, sunlight filtering through water, photorealistic, vibrant colors, marine life, crystal clear water",
+            "polluted, murky, dark, blurry, low quality"
+        ),
+        "example_btn6": (
+            "Abstract geometric art, colorful shapes, dynamic composition, modern art, bold patterns, artistic expression, vivid colors",
+            "representational, realistic, boring, dull colors, simple"
+        ),
+    }
+    # Connect example buttons
+    def set_example(prompt, negative):
+        return prompt, negative, "Click Generate to create!"
+    for btn_name, (prompt, negative) in example_prompts.items():
+        demo.get_component(btn_name).click(
+            fn=set_example,
+            inputs=[gr.State(prompt), gr.State(negative)],
+            outputs=[prompt_input, negative_prompt_input, status_text],
+        )
+    # Connect generate button
+    generate_btn.click(
+        fn=generate_image,
+        inputs=[
+            prompt_input,
+            negative_prompt_input,
+            width_slider,
+            height_slider,
+            steps_slider,
+            guidance_slider,
+            seed_input,
+        ],
+        outputs=[output_image, status_text],
+    )
+    # Footer
+    gr.Markdown("""
+    ---
+    **Byte Dream** v1.0.0 | Powered by Latent Diffusion Models | Optimized for CPU Inference
+    Created with ❤️ using PyTorch and Hugging Face Diffusers
+    """)
+if __name__ == "__main__":
+    print("\n" + "="*60)
+    print("Starting Byte Dream Web Interface")
+    print("="*60)
+    print("\nOpening browser...")
+    print("Press Ctrl+C to close\n")
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,
+        show_error=True,
+    )

bytedream/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""
+Byte Dream - AI Image Generation Model
+Production-ready text-to-image diffusion model optimized for CPU inference
+"""
+__version__ = "1.0.0"
+__author__ = "Byte Dream Team"
+from .generator import ByteDreamGenerator
+from .model import UNet2DConditionModel, AutoencoderKL, CLIPTextModel
+from .pipeline import ByteDreamPipeline
+from .scheduler import DDIMScheduler
+__all__ = [
+    "ByteDreamGenerator",
+    "UNet2DConditionModel",
+    "AutoencoderKL",
+    "CLIPTextModel",
+    "ByteDreamPipeline",
+    "DDIMScheduler",
+]

bytedream/generator.py ADDED Viewed

	@@ -0,0 +1,317 @@

+"""
+Byte Dream Generator
+Main inference engine optimized for CPU with advanced features
+"""
+import torch
+import yaml
+from pathlib import Path
+from typing import Optional, Union, List
+from PIL import Image
+import numpy as np
+import gc
+class ByteDreamGenerator:
+    """
+    Production-ready image generation engine
+    Optimized for CPU inference with memory efficiency
+    """
+    def __init__(
+        self,
+        model_path: Optional[str] = None,
+        config_path: str = "config.yaml",
+        device: str = "cpu",
+        use_safetensors: bool = True,
+    ):
+        """
+        Initialize Byte Dream generator
+        Args:
+            model_path: Path to trained model weights
+            config_path: Path to configuration file
+            device: Device to run on (default: cpu)
+            use_safetensors: Use safetensors format if available
+        """
+        self.device = device
+        self.config_path = config_path
+        self.use_safetensors = use_safetensors
+        # Load configuration
+        self.config = self._load_config(config_path)
+        # Initialize components
+        print("Initializing Byte Dream Generator...")
+        self.pipeline = self._initialize_pipeline(model_path)
+        # Optimize for CPU
+        self._optimize_for_cpu()
+        print("✓ Byte Dream Generator ready!")
+    def _load_config(self, config_path: str) -> dict:
+        """Load configuration from YAML file"""
+        try:
+            with open(config_path, 'r', encoding='utf-8') as f:
+                config = yaml.safe_load(f)
+            return config
+        except FileNotFoundError:
+            print(f"Warning: Config file {config_path} not found. Using defaults.")
+            return self._get_default_config()
+    def _get_default_config(self) -> dict:
+        """Get default configuration"""
+        return {
+            'model': {
+                'unet': {
+                    'in_channels': 4,
+                    'out_channels': 4,
+                    'block_out_channels': [320, 640, 1280, 1280],
+                    'layers_per_block': 2,
+                    'attention_head_dim': 8,
+                    'cross_attention_dim': 768,
+                    'use_linear_projection': True,
+                },
+                'scheduler': {
+                    'name': 'DDIM',
+                    'num_train_timesteps': 1000,
+                    'beta_start': 0.00085,
+                    'beta_end': 0.012,
+                    'beta_schedule': 'scaled_linear',
+                    'clip_sample': False,
+                    'set_alpha_to_one': False,
+                }
+            },
+            'generation': {
+                'width': 512,
+                'height': 512,
+                'num_inference_steps': 50,
+                'guidance_scale': 7.5,
+                'negative_prompt': 'ugly, blurry, low quality, distorted, deformed',
+            }
+        }
+    def _initialize_pipeline(self, model_path: Optional[str]):
+        """Initialize the generation pipeline"""
+        from bytedream.model import create_unet, create_vae, create_text_encoder
+        from bytedream.scheduler import create_scheduler
+        from bytedream.pipeline import ByteDreamPipeline
+        # Create model components
+        print("Creating UNet...")
+        unet = create_unet(self.config)
+        print("Creating VAE...")
+        vae = create_vae(self.config)
+        print("Creating Text Encoder...")
+        text_encoder = create_text_encoder(self.config)
+        print("Creating Scheduler...")
+        scheduler = create_scheduler(self.config)
+        # Load pretrained weights if provided
+        if model_path:
+            self._load_model_weights(unet, model_path)
+        # Create pipeline
+        pipeline = ByteDreamPipeline(
+            text_encoder=text_encoder,
+            vae=vae,
+            unet=unet,
+            scheduler=scheduler,
+            device=self.device,
+            dtype=torch.float32,
+        )
+        return pipeline
+    def _load_model_weights(self, unet, model_path: str):
+        """Load pretrained model weights"""
+        model_file = Path(model_path) / "unet_pytorch_model.bin"
+        if not model_file.exists():
+            model_file = Path(model_path) / "pytorch_model.bin"
+        if model_file.exists():
+            print(f"Loading weights from {model_file}...")
+            checkpoint = torch.load(model_file, map_location=self.device)
+            if 'unet_state_dict' in checkpoint:
+                unet.load_state_dict(checkpoint['unet_state_dict'])
+            else:
+                unet.load_state_dict(checkpoint)
+            print("✓ Weights loaded successfully!")
+        else:
+            print("⚠ No pretrained weights found. Using random initialization.")
+            print("  Train the model or download pretrained weights.")
+    def _optimize_for_cpu(self):
+        """Optimize pipeline for CPU inference"""
+        # Set number of threads
+        cpu_config = self.config.get('cpu_optimization', {})
+        threads = cpu_config.get('threads', -1)
+        if threads > 0:
+            torch.set_num_threads(threads)
+        else:
+            # Use all available cores
+            import os
+            torch.set_num_threads(os.cpu_count())
+        # Enable memory efficient mode
+        self.pipeline.enable_memory_efficient_mode()
+        print(f"✓ Optimized for CPU ({torch.get_num_threads()} threads)")
+    @torch.no_grad()
+    def generate(
+        self,
+        prompt: str,
+        negative_prompt: Optional[str] = None,
+        width: Optional[int] = None,
+        height: Optional[int] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        seed: Optional[int] = None,
+        eta: float = 0.0,
+        output_type: str = "pil",
+    ) -> Image.Image:
+        """
+        Generate image from text prompt
+        Args:
+            prompt: Text description of desired image
+            negative_prompt: Things to avoid in the image
+            width: Output image width (default: 512)
+            height: Output image height (default: 512)
+            num_inference_steps: Number of denoising steps (default: 50)
+            guidance_scale: How closely to follow prompt (default: 7.5)
+            seed: Random seed for reproducibility
+            eta: DDIM eta parameter (0.0 for deterministic)
+            output_type: Output format ("pil" or "tensor")
+        Returns:
+            Generated PIL Image
+        """
+        # Get default values from config
+        gen_config = self.config.get('generation', {})
+        width = width or gen_config.get('width', 512)
+        height = height or gen_config.get('height', 512)
+        num_inference_steps = num_inference_steps or gen_config.get('num_inference_steps', 50)
+        guidance_scale = guidance_scale or gen_config.get('guidance_scale', 7.5)
+        negative_prompt = negative_prompt or gen_config.get('negative_prompt', "")
+        # Ensure dimensions are divisible by 8
+        width = (width // 8) * 8
+        height = (height // 8) * 8
+        print(f"\nGenerating image...")
+        print(f"Prompt: {prompt}")
+        if negative_prompt:
+            print(f"Negative prompt: {negative_prompt}")
+        print(f"Size: {width}x{height}")
+        print(f"Steps: {num_inference_steps}")
+        print(f"Guidance scale: {guidance_scale}")
+        # Set random seed
+        generator = None
+        if seed is not None:
+            generator = torch.Generator(device=self.device).manual_seed(seed)
+            print(f"Seed: {seed}")
+        # Generate image
+        result = self.pipeline(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            height=height,
+            width=width,
+            num_inference_steps=num_inference_steps,
+            guidance_scale=guidance_scale,
+            eta=eta,
+            generator=generator,
+            output_type=output_type,
+        )
+        image = result[0] if isinstance(result, (list, tuple)) else result
+        print("\n✓ Image generated successfully!")
+        return image
+    def generate_batch(
+        self,
+        prompts: List[str],
+        negative_prompt: Optional[str] = None,
+        width: int = 512,
+        height: int = 512,
+        num_inference_steps: int = 50,
+        guidance_scale: float = 7.5,
+        seeds: Optional[List[int]] = None,
+    ) -> List[Image.Image]:
+        """
+        Generate multiple images from prompts
+        Args:
+            prompts: List of text prompts
+            negative_prompt: Negative prompt for all images
+            width: Image width
+            height: Image height
+            num_inference_steps: Number of denoising steps
+            guidance_scale: Guidance scale
+            seeds: Random seeds for each image
+        Returns:
+            List of generated PIL Images
+        """
+        images = []
+        for i, prompt in enumerate(prompts):
+            seed = seeds[i] if seeds and i < len(seeds) else None
+            print(f"\n{'='*50}")
+            print(f"Generating image {i+1}/{len(prompts)}")
+            print(f"{'='*50}")
+            image = self.generate(
+                prompt=prompt,
+                negative_prompt=negative_prompt,
+                width=width,
+                height=height,
+                num_inference_steps=num_inference_steps,
+                guidance_scale=guidance_scale,
+                seed=seed,
+            )
+            images.append(image)
+            # Clear memory between generations
+            gc.collect()
+        return images
+    def get_model_info(self) -> dict:
+        """Get model information"""
+        unet_params = sum(p.numel() for p in self.pipeline.unet.parameters())
+        vae_params = sum(p.numel() for p in self.pipeline.vae.parameters())
+        info = {
+            'name': self.config['model']['name'],
+            'version': self.config['model']['version'],
+            'unet_parameters': f"{unet_params:,}",
+            'device': self.device,
+            'dtype': str(self.pipeline.dtype),
+            'default_resolution': f"{self.config['generation']['width']}x{self.config['generation']['height']}",
+        }
+        return info
+    def clear_memory(self):
+        """Clear GPU/CPU memory"""
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        print("Memory cleared")

bytedream/model.py ADDED Viewed

	@@ -0,0 +1,582 @@

+"""
+Byte Dream Model Architecture
+Complete implementation of UNet, VAE, and Text Encoder for diffusion-based image generation
+"""
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple, Union
+import math
+class ResnetBlock2D(nn.Module):
+    """Residual block for 2D convolutions"""
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        temb_channels: Optional[int] = None,
+        groups: int = 32,
+        eps: float = 1e-6,
+    ):
+        super().__init__()
+        self.norm1 = nn.GroupNorm(num_groups=groups, num_channels=in_channels, eps=eps, affine=True)
+        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
+        if temb_channels is not None:
+            self.time_emb_proj = nn.Linear(temb_channels, out_channels)
+        self.norm2 = nn.GroupNorm(num_groups=groups, num_channels=out_channels, eps=eps, affine=True)
+        self.dropout = nn.Dropout(0.0)
+        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
+        self.nonlinearity = nn.SiLU(inplace=True)
+        if in_channels != out_channels:
+            self.conv_shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
+        else:
+            self.conv_shortcut = None
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        temb: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        x = hidden_states
+        x = self.norm1(x)
+        x = self.nonlinearity(x)
+        x = self.conv1(x)
+        if temb is not None:
+            temb = self.time_emb_proj(self.nonlinearity(temb))[:, :, None, None]
+            x = x + temb
+        x = self.norm2(x)
+        x = self.nonlinearity(x)
+        x = self.dropout(x)
+        x = self.conv2(x)
+        if self.conv_shortcut is not None:
+            hidden_states = self.conv_shortcut(hidden_states)
+        return x + hidden_states
+class AttentionBlock(nn.Module):
+    """Cross-attention block for text-conditioned generation"""
+    def __init__(
+        self,
+        query_dim: int,
+        cross_attention_dim: Optional[int] = None,
+        num_heads: int = 8,
+        head_dim: Optional[int] = None,
+        eps: float = 1e-6,
+    ):
+        super().__init__()
+        inner_dim = num_heads * head_dim if head_dim is not None else query_dim
+        cross_attention_dim = cross_attention_dim if cross_attention_dim is not None else query_dim
+        self.num_heads = num_heads
+        self.head_dim = head_dim if head_dim is not None else query_dim // num_heads
+        self.to_q = nn.Linear(query_dim, inner_dim, bias=False)
+        self.to_k = nn.Linear(cross_attention_dim, inner_dim, bias=False)
+        self.to_v = nn.Linear(cross_attention_dim, inner_dim, bias=False)
+        self.to_out = nn.ModuleList([
+            nn.Linear(inner_dim, query_dim),
+            nn.Dropout(0.0)
+        ])
+        self.norm = nn.LayerNorm(query_dim, eps=eps)
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        residual = hidden_states
+        batch_size, sequence_length, _ = hidden_states.shape
+        query = self.to_q(hidden_states)
+        encoder_hidden_states = encoder_hidden_states if encoder_hidden_states is not None else hidden_states
+        key = self.to_k(encoder_hidden_states)
+        value = self.to_v(encoder_hidden_states)
+        # Multi-head attention
+        query = query.reshape(batch_size, sequence_length, self.num_heads, self.head_dim).transpose(1, 2)
+        key = key.reshape(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
+        value = value.reshape(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
+        # Scaled dot-product attention
+        attn_weights = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.head_dim)
+        attn_weights = F.softmax(attn_weights, dim=-1)
+        attn_output = torch.matmul(attn_weights, value)
+        attn_output = attn_output.transpose(1, 2).reshape(batch_size, sequence_length, -1)
+        # Output projection
+        for layer in self.to_out:
+            attn_output = layer(attn_output)
+        return residual + attn_output
+class DownBlock2D(nn.Module):
+    """Downsampling block"""
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        temb_channels: int,
+        num_layers: int = 1,
+        add_downsample: bool = True,
+        has_cross_attention: bool = False,
+        cross_attention_dim: Optional[int] = None,
+    ):
+        super().__init__()
+        resnets = []
+        attentions = []
+        for i in range(num_layers):
+            in_ch = in_channels if i == 0 else out_channels
+            resnets.append(ResnetBlock2D(
+                in_channels=in_ch,
+                out_channels=out_channels,
+                temb_channels=temb_channels,
+            ))
+            if has_cross_attention:
+                attentions.append(AttentionBlock(
+                    query_dim=out_channels,
+                    cross_attention_dim=cross_attention_dim,
+                    num_heads=8,
+                    head_dim=out_channels // 8,
+                ))
+            else:
+                attentions.append(None)
+        self.resnets = nn.ModuleList(resnets)
+        self.attentions = nn.ModuleList(attentions)
+        if add_downsample:
+            self.downsamplers = nn.ModuleList([
+                nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=2, padding=1)
+            ])
+        else:
+            self.downsamplers = None
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        temb: Optional[torch.Tensor] = None,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        output_states = ()
+        for i, (resnet, attn) in enumerate(zip(self.resnets, self.attentions)):
+            hidden_states = resnet(hidden_states, temb)
+            if attn is not None and encoder_hidden_states is not None:
+                hidden_states = attn(hidden_states, encoder_hidden_states)
+            output_states += (hidden_states,)
+        if self.downsamplers is not None:
+            for downsampler in self.downsamplers:
+                hidden_states = downsampler(hidden_states)
+            output_states += (hidden_states,)
+        return hidden_states, output_states
+class UpBlock2D(nn.Module):
+    """Upsampling block"""
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        prev_output_channel: int,
+        temb_channels: int,
+        num_layers: int = 1,
+        add_upsample: bool = True,
+        has_cross_attention: bool = False,
+        cross_attention_dim: Optional[int] = None,
+    ):
+        super().__init__()
+        resnets = []
+        attentions = []
+        for i in range(num_layers):
+            in_ch = in_channels if i == 0 else out_channels
+            mix_ch = prev_output_channel if i == num_layers - 1 else out_channels
+            resnets.append(ResnetBlock2D(
+                in_channels=in_ch + mix_ch,
+                out_channels=out_channels,
+                temb_channels=temb_channels,
+            ))
+            if has_cross_attention:
+                attentions.append(AttentionBlock(
+                    query_dim=out_channels,
+                    cross_attention_dim=cross_attention_dim,
+                    num_heads=8,
+                    head_dim=out_channels // 8,
+                ))
+            else:
+                attentions.append(None)
+        self.resnets = nn.ModuleList(resnets)
+        self.attentions = nn.ModuleList(attentions)
+        if add_upsample:
+            self.upsamplers = nn.ModuleList([
+                nn.ConvTranspose2d(out_channels, out_channels, kernel_size=4, stride=2, padding=1)
+            ])
+        else:
+            self.upsamplers = None
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        res_hidden_states_tuple: Tuple[torch.Tensor, ...],
+        temb: Optional[torch.Tensor] = None,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        for i, (resnet, attn) in enumerate(zip(self.resnets, self.attentions)):
+            # Skip connection from U-Net downsampling path
+            hidden_states = torch.cat([hidden_states, res_hidden_states_tuple[i]], dim=1)
+            hidden_states = resnet(hidden_states, temb)
+            if attn is not None and encoder_hidden_states is not None:
+                hidden_states = attn(hidden_states, encoder_hidden_states)
+        if self.upsamplers is not None:
+            for upsampler in self.upsamplers:
+                hidden_states = upsampler(hidden_states)
+        return hidden_states
+class UNet2DConditionModel(nn.Module):
+    """
+    Main UNet architecture for diffusion-based image generation
+    Handles noise prediction conditioned on text embeddings
+    """
+    def __init__(
+        self,
+        in_channels: int = 4,
+        out_channels: int = 4,
+        block_out_channels: Tuple[int, ...] = (320, 640, 1280, 1280),
+        layers_per_block: int = 2,
+        attention_head_dim: int = 8,
+        cross_attention_dim: int = 768,
+        use_linear_projection: bool = True,
+    ):
+        super().__init__()
+        self.in_channels = in_channels
+        self.block_out_channels = block_out_channels
+        self.layers_per_block = layers_per_block
+        self.cross_attention_dim = cross_attention_dim
+        # Time embedding
+        time_embed_dim = block_out_channels[0] * 4
+        self.time_proj = nn.Sequential(
+            nn.Linear(block_out_channels[0], time_embed_dim),
+            nn.SiLU(inplace=True),
+            nn.Linear(time_embed_dim, time_embed_dim),
+        )
+        # Input convolution
+        self.conv_in = nn.Conv2d(in_channels, block_out_channels[0], kernel_size=3, stride=1, padding=1)
+        # Down blocks
+        self.down_blocks = nn.ModuleList([])
+        output_channel = block_out_channels[0]
+        for i, down_block_type in enumerate(["down", "down", "down", "down"]):
+            input_channel = output_channel
+            output_channel = block_out_channels[i]
+            is_final_block = i == len(block_out_channels) - 1
+            down_block = DownBlock2D(
+                in_channels=input_channel,
+                out_channels=output_channel,
+                temb_channels=time_embed_dim,
+                num_layers=layers_per_block,
+                add_downsample=not is_final_block,
+                has_cross_attention=True,
+                cross_attention_dim=cross_attention_dim,
+            )
+            self.down_blocks.append(down_block)
+        # Middle blocks
+        self.mid_block = nn.ModuleList([
+            ResnetBlock2D(
+                in_channels=block_out_channels[-1],
+                out_channels=block_out_channels[-1],
+                temb_channels=time_embed_dim,
+            ),
+            AttentionBlock(
+                query_dim=block_out_channels[-1],
+                cross_attention_dim=cross_attention_dim,
+                num_heads=attention_head_dim,
+                head_dim=block_out_channels[-1] // attention_head_dim,
+            ),
+            ResnetBlock2D(
+                in_channels=block_out_channels[-1],
+                out_channels=block_out_channels[-1],
+                temb_channels=time_embed_dim,
+            ),
+        ])
+        # Up blocks
+        self.up_blocks = nn.ModuleList([])
+        reversed_block_out_channels = list(reversed(block_out_channels))
+        for i, up_block_type in enumerate(["up", "up", "up", "up"]):
+            prev_output_channel = reversed_block_out_channels[min(i + 1, len(block_out_channels) - 1)]
+            output_channel = reversed_block_out_channels[i]
+            is_final_block = i == len(block_out_channels) - 1
+            up_block = UpBlock2D(
+                in_channels=reversed_block_out_channels[i - 1] if i > 0 else reversed_block_out_channels[0],
+                out_channels=output_channel,
+                prev_output_channel=prev_output_channel,
+                temb_channels=time_embed_dim,
+                num_layers=layers_per_block + 1,
+                add_upsample=not is_final_block,
+                has_cross_attention=True,
+                cross_attention_dim=cross_attention_dim,
+            )
+            self.up_blocks.append(up_block)
+        # Output
+        self.conv_norm_out = nn.GroupNorm(num_channels=block_out_channels[0], num_channels=block_out_channels[0], eps=1e-6)
+        self.conv_act = nn.SiLU(inplace=True)
+        self.conv_out = nn.Conv2d(block_out_channels[0], out_channels, kernel_size=3, stride=1, padding=1)
+    def forward(
+        self,
+        sample: torch.Tensor,
+        timestep: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
+    ) -> torch.Tensor:
+        # Time embedding
+        timesteps_proj = self.time_proj(timestep)
+        temb = timesteps_proj
+        # Initial convolution
+        hidden_states = self.conv_in(sample)
+        # Down sampling path
+        down_block_res_samples = (hidden_states,)
+        for downsample_block in self.down_blocks:
+            hidden_states, res_samples = downsample_block(
+                hidden_states=hidden_states,
+                temb=temb,
+                encoder_hidden_states=encoder_hidden_states,
+            )
+            down_block_res_samples += res_samples
+        # Middle
+        for layer in self.mid_block:
+            if isinstance(layer, ResnetBlock2D):
+                hidden_states = layer(hidden_states, temb)
+            else:
+                hidden_states = layer(hidden_states, encoder_hidden_states)
+        # Up sampling path
+        for upsample_block in self.up_blocks:
+            res_samples = down_block_res_samples[-len(upsample_block.resnets):]
+            down_block_res_samples = down_block_res_samples[:-len(upsample_block.resnets)]
+            hidden_states = upsample_block(
+                hidden_states=hidden_states,
+                res_hidden_states_tuple=res_samples,
+                temb=temb,
+                encoder_hidden_states=encoder_hidden_states,
+            )
+        # Output
+        hidden_states = self.conv_norm_out(hidden_states)
+        hidden_states = self.conv_act(hidden_states)
+        hidden_states = self.conv_out(hidden_states)
+        return hidden_states
+class AutoencoderKL(nn.Module):
+    """
+    Variational Autoencoder for image compression and reconstruction
+    Compresses images to latent space for efficient diffusion
+    """
+    def __init__(
+        self,
+        in_channels: int = 3,
+        out_channels: int = 3,
+        down_block_types: Tuple[str, ...] = ("DownEncoderBlock2D",) * 4,
+        up_block_types: Tuple[str, ...] = ("UpDecoderBlock2D",) * 4,
+        latent_channels: int = 4,
+        sample_size: int = 512,
+    ):
+        super().__init__()
+        self.sample_size = sample_size
+        # Encoder
+        self.encoder = nn.ModuleList()
+        channels = [in_channels, 128, 256, 512, 512]
+        for i in range(len(down_block_types)):
+            block = nn.Sequential(
+                nn.Conv2d(channels[i], channels[i+1], kernel_size=3, stride=2, padding=1),
+                nn.GroupNorm(num_channels=channels[i+1], num_channels=channels[i+1], eps=1e-6),
+                nn.SiLU(inplace=True),
+            )
+            self.encoder.append(block)
+        # Latent space projection
+        self.quant_conv = nn.Conv2d(512, latent_channels * 2, kernel_size=1)
+        # Decoder
+        self.decoder = nn.ModuleList()
+        decoder_channels = [latent_channels, 512, 512, 256, 128]
+        for i in range(len(up_block_types)):
+            block = nn.Sequential(
+                nn.ConvTranspose2d(decoder_channels[i], decoder_channels[i+1], kernel_size=4, stride=2, padding=1),
+                nn.GroupNorm(num_channels=decoder_channels[i+1], num_channels=decoder_channels[i+1], eps=1e-6),
+                nn.SiLU(inplace=True),
+            )
+            self.decoder.append(block)
+        self.post_quant_conv = nn.Conv2d(latent_channels, 512, kernel_size=1)
+        self.conv_out = nn.Conv2d(128, out_channels, kernel_size=3, stride=1, padding=1)
+    def encode(self, x: torch.Tensor) -> torch.Tensor:
+        """Encode image to latent space"""
+        for block in self.encoder:
+            x = block(x)
+        x = self.quant_conv(x)
+        return x
+    def decode(self, z: torch.Tensor) -> torch.Tensor:
+        """Decode from latent space to image"""
+        z = self.post_quant_conv(z)
+        for block in self.decoder:
+            z = block(z)
+        z = self.conv_out(z)
+        return z
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Full autoencoder forward pass"""
+        encoded = self.encode(x)
+        decoded = self.decode(encoded[:, :4])  # Use first 4 channels
+        return decoded
+class CLIPTextModel(nn.Module):
+    """
+    CLIP text encoder for understanding text prompts
+    Extracts semantic features from text for conditioning
+    """
+    def __init__(self, model_name: str = "openai/clip-vit-large-patch14", max_length: int = 77):
+        super().__init__()
+        try:
+            from transformers import CLIPTextModel as HFCLIPTextModel, CLIPTokenizer
+            self.model = HFCLIPTextModel.from_pretrained(model_name)
+            self.tokenizer = CLIPTokenizer.from_pretrained(model_name)
+            self.max_length = max_length
+        except ImportError:
+            print("Warning: transformers not installed. Using dummy text encoder.")
+            self.model = None
+            self.tokenizer = None
+    def forward(self, text: Union[str, List[str]], device: torch.device = None) -> torch.Tensor:
+        """
+        Encode text to embeddings
+        Args:
+            text: Text string or list of strings
+            device: Target device for computation
+        Returns:
+            Text embeddings tensor
+        """
+        if self.model is None:
+            # Dummy implementation if transformers not available
+            return torch.zeros(1, 77, 768)
+        inputs = self.tokenizer(
+            text,
+            padding="max_length",
+            max_length=self.max_length,
+            truncation=True,
+            return_tensors="pt",
+        )
+        if device is not None:
+            inputs = {k: v.to(device) for k, v in inputs.items()}
+        outputs = self.model(**inputs)
+        return outputs.last_hidden_state
+def create_unet(config):
+    """Factory function to create UNet from config"""
+    unet_config = config['model']['unet']
+    return UNet2DConditionModel(
+        in_channels=unet_config['in_channels'],
+        out_channels=unet_config['out_channels'],
+        block_out_channels=tuple(unet_config['block_out_channels']),
+        layers_per_block=unet_config['layers_per_block'],
+        attention_head_dim=unet_config['attention_head_dim'],
+        cross_attention_dim=unet_config['cross_attention_dim'],
+        use_linear_projection=unet_config['use_linear_projection'],
+    )
+def create_vae(config):
+    """Factory function to create VAE from config"""
+    vae_config = config['model']['vae']
+    return AutoencoderKL(
+        in_channels=vae_config['in_channels'],
+        out_channels=vae_config['out_channels'],
+        down_block_types=tuple(vae_config['down_block_types']),
+        up_block_types=tuple(vae_config['up_block_types']),
+        latent_channels=vae_config['latent_channels'],
+        sample_size=vae_config['sample_size'],
+    )
+def create_text_encoder(config):
+    """Factory function to create text encoder from config"""
+    text_config = config['model']['text_encoder']
+    return CLIPTextModel(
+        model_name=text_config['model'],
+        max_length=text_config['max_length'],
+    )

bytedream/pipeline.py ADDED Viewed

	@@ -0,0 +1,312 @@

+"""
+Byte Dream Pipeline
+Complete diffusion pipeline for text-to-image generation
+Integrates all components: text encoder, UNet, VAE, and scheduler
+"""
+import torch
+import numpy as np
+from typing import Optional, Union, List
+from PIL import Image
+import gc
+class ByteDreamPipeline:
+    """
+    Complete pipeline for text-to-image generation
+    Manages the entire diffusion process from prompt to final image
+    Optimized for CPU inference with memory efficiency
+    """
+    def __init__(
+        self,
+        text_encoder,
+        vae,
+        unet,
+        scheduler,
+        device: str = "cpu",
+        dtype: torch.dtype = torch.float32,
+    ):
+        self.text_encoder = text_encoder
+        self.vae = vae
+        self.unet = unet
+        self.scheduler = scheduler
+        self.device = torch.device(device)
+        self.dtype = dtype
+        # Move models to device
+        self._move_models_to_device()
+        # Set models to evaluation mode
+        self._set_eval_mode()
+    def _move_models_to_device(self):
+        """Move all models to target device with memory optimization"""
+        print(f"Loading models to {self.device}...")
+        if hasattr(self.text_encoder, 'model') and self.text_encoder.model is not None:
+            self.text_encoder.model.to(self.device)
+        self.vae.to(self.device)
+        self.unet.to(self.device)
+    def _set_eval_mode(self):
+        """Set all models to evaluation mode"""
+        if hasattr(self.text_encoder, 'model') and self.text_encoder.model is not None:
+            self.text_encoder.model.eval()
+        self.vae.eval()
+        self.unet.eval()
+    @torch.no_grad()
+    def encode_prompt(
+        self,
+        prompt: Union[str, List[str]],
+        negative_prompt: Optional[Union[str, List[str]]] = None,
+        num_images_per_prompt: int = 1,
+    ) -> torch.Tensor:
+        """
+        Encode text prompts to embeddings
+        Args:
+            prompt: Text prompt or list of prompts
+            negative_prompt: Negative prompt for guidance
+            num_images_per_prompt: Number of images to generate per prompt
+        Returns:
+            Text embeddings tensor
+        """
+        # Handle batch size
+        if isinstance(prompt, str):
+            prompt = [prompt]
+        batch_size = len(prompt) * num_images_per_prompt
+        # Encode positive prompt
+        text_embeddings = self.text_encoder(prompt, device=self.device)
+        text_embeddings = text_embeddings.to(self.dtype)
+        # Encode negative prompt if provided
+        if negative_prompt is not None:
+            if isinstance(negative_prompt, str):
+                negative_prompt = [negative_prompt]
+            uncond_embeddings = self.text_encoder(negative_prompt, device=self.device)
+            uncond_embeddings = uncond_embeddings.to(self.dtype)
+            # Concatenate for classifier-free guidance
+            text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
+        return text_embeddings
+    @torch.no_grad()
+    def decode_latents(self, latents: torch.Tensor) -> Image.Image:
+        """
+        Decode latent representation to image
+        Args:
+            latents: Latent space tensor
+        Returns:
+            PIL Image
+        """
+        # Scale latents
+        latents = 1 / 0.18215 * latents
+        # Decode through VAE
+        image = self.vae.decode(latents)
+        image = torch.clamp(image, -1, 1)
+        # Convert to PIL Image
+        image = (image / 2 + 0.5).clamp(0, 1)
+        image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
+        image = (image * 255).round().astype("uint8")
+        return Image.fromarray(image)
+    @torch.no_grad()
+    def prepare_latents(
+        self,
+        batch_size: int,
+        height: int,
+        width: int,
+        generator: Optional[torch.Generator] = None,
+    ) -> torch.Tensor:
+        """
+        Initialize random noise latents
+        Args:
+            batch_size: Number of images to generate
+            height: Image height
+            width: Image width
+            generator: Random number generator
+        Returns:
+            Initial noise tensor
+        """
+        shape = (batch_size, 4, height // 8, width // 8)
+        latents = torch.randn(shape, generator=generator, dtype=self.dtype)
+        latents = latents.to(self.device)
+        # Scale initial noise
+        latents = latents * self.scheduler.init_noise_scale if hasattr(self.scheduler, 'init_noise_scale') else latents
+        return latents
+    @torch.no_grad()
+    def __call__(
+        self,
+        prompt: Union[str, List[str]],
+        negative_prompt: Optional[Union[str, List[str]]] = None,
+        height: int = 512,
+        width: int = 512,
+        num_inference_steps: int = 50,
+        guidance_scale: float = 7.5,
+        eta: float = 0.0,
+        generator: Optional[torch.Generator] = None,
+        output_type: str = "pil",
+        return_dict: bool = False,
+    ) -> Union[List[Image.Image], tuple]:
+        """
+        Generate images from text prompts
+        Args:
+            prompt: Text prompt(s) for generation
+            negative_prompt: Negative prompt for better quality
+            height: Output image height
+            width: Output image width
+            num_inference_steps: Number of denoising steps
+            guidance_scale: Classifier-free guidance scale
+            eta: DDIM eta parameter
+            generator: Random number generator
+            output_type: Output format ("pil" or "tensor")
+            return_dict: Whether to return as dictionary
+        Returns:
+            Generated images or tuple
+        """
+        # Default settings
+        if negative_prompt is None:
+            negative_prompt = ""
+        # Batch size
+        batch_size = 1 if isinstance(prompt, str) else len(prompt)
+        # Encode prompts
+        text_embeddings = self.encode_prompt(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            num_images_per_prompt=1,
+        )
+        # Prepare timesteps
+        self.scheduler.set_timesteps(num_inference_steps)
+        timesteps = self.scheduler.timesteps
+        # Prepare latents
+        latents = self.prepare_latents(
+            batch_size=batch_size,
+            height=height,
+            width=width,
+            generator=generator,
+        )
+        # Denoising loop
+        for i, t in enumerate(timesteps):
+            # Expand latents for classifier-free guidance
+            latent_model_input = torch.cat([latents] * 2) if negative_prompt else latents
+            latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
+            # Predict noise
+            timestep_tensor = torch.tensor([t], dtype=torch.long, device=self.device)
+            noise_pred = self.unet(
+                sample=latent_model_input,
+                timestep=timestep_tensor,
+                encoder_hidden_states=text_embeddings,
+            )
+            # Apply classifier-free guidance
+            if negative_prompt:
+                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
+                noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
+            # Compute previous noisy sample
+            latents, _ = self.scheduler.step(
+                model_output=noise_pred,
+                timestep=t,
+                sample=latents,
+                eta=eta,
+                generator=generator,
+            )
+            # Print progress
+            if (i + 1) % 10 == 0 or i == len(timesteps) - 1:
+                print(f"Step {i+1}/{len(timesteps)}")
+        # Decode to image
+        image = self.decode_latents(latents)
+        if output_type != "pil":
+            return (image,) if not return_dict else {"images": [image]}
+        return [image] if not return_dict else {"images": [image]}
+    def enable_memory_efficient_mode(self):
+        """Enable memory-efficient mode for CPU inference"""
+        # Clear CUDA cache if available
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        # Force garbage collection
+        gc.collect()
+        print("Memory efficient mode enabled")
+    def optimize_for_cpu(self, threads: int = -1):
+        """
+        Optimize pipeline for CPU inference
+        Args:
+            threads: Number of threads to use (-1 for all available)
+        """
+        if threads > 0:
+            torch.set_num_threads(threads)
+        # Set optimal number of threads
+        if threads == -1:
+            import os
+            torch.set_num_threads(os.cpu_count())
+        print(f"Optimized for CPU with {torch.get_num_threads()} threads")
+def create_pipeline(config, device: str = "cpu"):
+    """
+    Factory function to create complete pipeline from config
+    Args:
+        config: Configuration dictionary
+        device: Target device
+    Returns:
+        ByteDreamPipeline instance
+    """
+    from .model import create_unet, create_vae, create_text_encoder
+    from .scheduler import create_scheduler
+    # Create components
+    text_encoder = create_text_encoder(config)
+    vae = create_vae(config)
+    unet = create_unet(config)
+    scheduler = create_scheduler(config)
+    # Create pipeline
+    pipeline = ByteDreamPipeline(
+        text_encoder=text_encoder,
+        vae=vae,
+        unet=unet,
+        scheduler=scheduler,
+        device=device,
+    )
+    return pipeline

bytedream/scheduler.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""
+Byte Dream Diffusion Scheduler
+Implements DDIM (Denoising Diffusion Implicit Models) sampling for fast, high-quality generation
+"""
+import torch
+import numpy as np
+from typing import Optional, Tuple, Union
+import math
+class DDIMScheduler:
+    """
+    DDIM Scheduler for diffusion sampling
+    Provides deterministic sampling with fewer steps than traditional DDPM
+    Optimized for CPU inference with efficient computation
+    """
+    def __init__(
+        self,
+        num_train_timesteps: int = 1000,
+        beta_start: float = 0.00085,
+        beta_end: float = 0.012,
+        beta_schedule: str = "scaled_linear",
+        clip_sample: bool = False,
+        set_alpha_to_one: bool = False,
+    ):
+        self.num_train_timesteps = num_train_timesteps
+        self.beta_start = beta_start
+        self.beta_end = beta_end
+        self.beta_schedule = beta_schedule
+        self.clip_sample = clip_sample
+        self.set_alpha_to_one = set_alpha_to_one
+        # Compute betas
+        if beta_schedule == "scaled_linear":
+            self.betas = torch.linspace(
+                beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps, dtype=torch.float32
+            ) ** 2
+        elif beta_schedule == "linear":
+            self.betas = torch.linspace(
+                beta_start, beta_end, num_train_timesteps, dtype=torch.float32
+            )
+        else:
+            raise ValueError(f"Unknown beta schedule: {beta_schedule}")
+        # Compute alphas
+        self.alphas = 1.0 - self.betas
+        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
+        self.final_alpha_cumprod = torch.tensor(1.0) if set_alpha_to_one else self.alphas_cumprod[0]
+        # Set timesteps
+        self.timesteps = None
+    def set_timesteps(self, num_inference_steps: int) -> None:
+        """
+        Set timesteps for inference
+        Args:
+            num_inference_steps: Number of denoising steps
+        """
+        step_ratio = self.num_train_timesteps // num_inference_steps
+        self.timesteps = (
+            (torch.arange(0, num_inference_steps) * step_ratio)
+            .round()
+            .flip(dims=[0])
+            .to(dtype=torch.long)
+        )
+    def _get_variance(self, timestep: int, prev_timestep: int) -> torch.Tensor:
+        """Compute variance for the diffusion step"""
+        alpha_prod_t = self.alphas_cumprod[timestep]
+        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
+        beta_prod_t = 1 - alpha_prod_t
+        beta_prod_t_prev = 1 - alpha_prod_t_prev
+        variance = (beta_prod_t_prev / beta_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)
+        return variance
+    def step(
+        self,
+        model_output: torch.Tensor,
+        timestep: int,
+        sample: torch.Tensor,
+        eta: float = 0.0,
+        use_clipped_model_output: bool = False,
+        generator: Optional[torch.Generator] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """
+        Perform a single denoising step
+        Args:
+            model_output: Predicted noise from UNet
+            timestep: Current timestep
+            sample: Current noisy sample
+            eta: DDIM eta parameter (0 for deterministic)
+            use_clipped_model_output: Whether to clip model output
+            generator: Random number generator
+        Returns:
+            Tuple of (previous_sample, pred_original_sample)
+        """
+        # Get previous timestep
+        prev_timestep = timestep - self.num_train_timesteps // len(self.timesteps)
+        # Compute alpha and sigma
+        alpha_prod_t = self.alphas_cumprod[timestep]
+        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
+        beta_prod_t = 1 - alpha_prod_t
+        # Compute predicted original sample
+        pred_original_sample = (sample - beta_prod_t ** 0.5 * model_output) / alpha_prod_t ** 0.5
+        if use_clipped_model_output:
+            pred_original_sample = torch.clamp(pred_original_sample, -1, 1)
+        # Compute direction pointing to x_t
+        model_output_direction = (1 - alpha_prod_t_prev) ** 0.5
+        # Compute sigma (eta * std_dev)
+        variance = self._get_variance(timestep, prev_timestep)
+        std_dev_t = eta * variance ** 0.5
+        # Compute x_{t-1}
+        pred_sample_direction = alpha_prod_t_prev ** 0.5
+        prev_sample = pred_sample_direction * pred_original_sample + model_output_direction * model_output
+        # Add noise if eta > 0
+        if eta > 0:
+            noise = torch.randn(model_output.shape, dtype=model_output.dtype, generator=generator).to(model_output.device)
+            prev_sample = prev_sample + std_dev_t * noise
+        # Clip if necessary
+        if self.clip_sample:
+            prev_sample = torch.clamp(prev_sample, -1, 1)
+        return prev_sample, pred_original_sample
+    def add_noise(
+        self,
+        original_samples: torch.Tensor,
+        noise: torch.Tensor,
+        timesteps: torch.Tensor,
+    ) -> torch.Tensor:
+        """
+        Add noise to samples (forward diffusion process)
+        Args:
+            original_samples: Original clean samples
+            noise: Noise to add
+            timesteps: Timesteps for each sample
+        Returns:
+            Noisy samples
+        """
+        alpha_prod_t = self.alphas_cumprod[timesteps].view(-1, 1, 1, 1)
+        sqrt_alpha_prod = alpha_prod_t ** 0.5
+        sqrt_one_minus_alpha_prod = (1 - alpha_prod_t) ** 0.5
+        noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
+        return noisy_samples
+    def scale_model_input(
+        self,
+        sample: torch.Tensor,
+        timestep: Optional[int] = None,
+    ) -> torch.Tensor:
+        """
+        Scale sample by standard deviation (for compatibility)
+        Args:
+            sample: Input sample
+            timestep: Current timestep
+        Returns:
+            Scaled sample
+        """
+        return sample
+    def get_scalings_for_boundary_condition_discrete(self, timestep):
+        """Get scalings for boundary condition"""
+        alpha_prod_t = self.alphas_cumprod[timestep]
+        sigma_t = ((1 - alpha_prod_t) * alpha_prod_t / (alpha_prod_t)) ** 0.5
+        c_out = -sigma_t
+        c_in = 1 / (alpha_prod_t ** 0.5)
+        return c_out, c_in
+class EulerDiscreteScheduler:
+    """
+    Euler discretization scheduler for ODE-based sampling
+    Alternative to DDIM with different sampling characteristics
+    """
+    def __init__(
+        self,
+        num_train_timesteps: int = 1000,
+        beta_start: float = 0.00085,
+        beta_end: float = 0.012,
+        beta_schedule: str = "scaled_linear",
+    ):
+        self.num_train_timesteps = num_train_timesteps
+        self.beta_start = beta_start
+        self.beta_end = beta_end
+        self.beta_schedule = beta_schedule
+        # Compute betas and sigmas
+        betas = torch.linspace(beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps) ** 2
+        alphas = 1.0 - betas
+        alphas_cumprod = torch.cumprod(alphas, dim=0)
+        self.sigmas = torch.cat([
+            torch.ones(1),
+            ((1 - alphas_cumprod) / alphas_cumprod) ** 0.5
+        ])
+        self.timesteps = None
+    def set_timesteps(self, num_inference_steps: int) -> None:
+        """Set timesteps for Euler sampling"""
+        step_ratio = len(self.sigmas) // num_inference_steps
+        self.timesteps = torch.arange(0, num_inference_steps) * step_ratio
+        self.timesteps = self.timesteps.flip(0)
+    def step(
+        self,
+        model_output: torch.Tensor,
+        timestep: int,
+        sample: torch.Tensor,
+    ) -> torch.Tensor:
+        """Perform Euler step"""
+        sigma_from = self.sigmas[timestep]
+        sigma_to = self.sigmas[timestep + 1] if timestep + 1 < len(self.sigmas) else torch.tensor(0.0)
+        sample_normalized = sample / ((sigma_from ** 2 + 1) ** 0.5)
+        derivative = (sample - sample_normalized) / sigma_from
+        dt = sigma_to - sigma_from
+        prev_sample = sample + derivative * dt
+        return prev_sample
+def create_scheduler(config):
+    """
+    Factory function to create scheduler from config
+    Args:
+        config: Configuration dictionary
+    Returns:
+        Scheduler instance
+    """
+    sched_config = config['model']['scheduler']
+    if sched_config['name'] == 'DDIM':
+        return DDIMScheduler(
+            num_train_timesteps=sched_config['num_train_timesteps'],
+            beta_start=sched_config['beta_start'],
+            beta_end=sched_config['beta_end'],
+            beta_schedule=sched_config['beta_schedule'],
+            clip_sample=sched_config['clip_sample'],
+            set_alpha_to_one=sched_config['set_alpha_to_one'],
+        )
+    elif sched_config['name'] == 'EulerDiscrete':
+        return EulerDiscreteScheduler(
+            num_train_timesteps=sched_config['num_train_timesteps'],
+            beta_start=sched_config['beta_start'],
+            beta_end=sched_config['beta_end'],
+            beta_schedule=sched_config['beta_schedule'],
+        )
+    else:
+        raise ValueError(f"Unknown scheduler: {sched_config['name']}")

bytedream/utils.py ADDED Viewed

	@@ -0,0 +1,398 @@

+"""
+Byte Dream Utilities
+Helper functions for image processing, model management, and optimization
+"""
+import torch
+import numpy as np
+from PIL import Image
+from pathlib import Path
+import hashlib
+import json
+from typing import Optional, Tuple, List
+def load_image(image_path: str) -> Image.Image:
+    """
+    Load image from file
+    Args:
+        image_path: Path to image file
+    Returns:
+        PIL Image object
+    """
+    path = Path(image_path)
+    if not path.exists():
+        raise FileNotFoundError(f"Image not found: {image_path}")
+    try:
+        image = Image.open(path).convert('RGB')
+        return image
+    except Exception as e:
+        raise IOError(f"Error loading image: {e}")
+def save_image(
+    image: Image.Image,
+    output_path: str,
+    format: str = None,
+    quality: int = 95,
+):
+    """
+    Save image to file
+    Args:
+        image: PIL Image to save
+        output_path: Output file path
+        format: Image format (PNG, JPEG, etc.)
+        quality: JPEG quality (1-100)
+    """
+    path = Path(output_path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    # Auto-detect format from extension
+    if format is None:
+        format = path.suffix.upper().replace('.', '')
+        if format == 'JPG':
+            format = 'JPEG'
+    # Save with appropriate settings
+    if format == 'JPEG':
+        image.save(path, format=format, quality=quality, optimize=True)
+    else:
+        image.save(path, format=format, optimize=True)
+    print(f"Image saved to: {path}")
+def resize_image(
+    image: Image.Image,
+    width: Optional[int] = None,
+    height: Optional[int] = None,
+    maintain_aspect: bool = True,
+) -> Image.Image:
+    """
+    Resize image to specified dimensions
+    Args:
+        image: Input image
+        width: Target width
+        height: Target height
+        maintain_aspect: Maintain aspect ratio
+    Returns:
+        Resized PIL Image
+    """
+    orig_width, orig_height = image.size
+    if width is None and height is None:
+        return image
+    if maintain_aspect:
+        if width and height:
+            # Fit within bounding box
+            ratio = min(width / orig_width, height / orig_height)
+            new_width = int(orig_width * ratio)
+            new_height = int(orig_height * ratio)
+        elif width:
+            ratio = width / orig_width
+            new_width = width
+            new_height = int(orig_height * ratio)
+        else:
+            ratio = height / orig_height
+            new_width = int(orig_width * ratio)
+            new_height = height
+    else:
+        new_width = width if width else orig_width
+        new_height = height if height else orig_height
+    resized = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
+    return resized
+def center_crop(image: Image.Image, width: int, height: int) -> Image.Image:
+    """
+    Center crop image to specified dimensions
+    Args:
+        image: Input image
+        width: Crop width
+        height: Crop height
+    Returns:
+        Cropped PIL Image
+    """
+    orig_width, orig_height = image.size
+    left = (orig_width - width) // 2
+    top = (orig_height - height) // 2
+    right = left + width
+    bottom = top + height
+    cropped = image.crop((left, top, right, bottom))
+    return cropped
+def image_to_tensor(image: Image.Image) -> torch.Tensor:
+    """
+    Convert PIL Image to PyTorch tensor
+    Args:
+        image: PIL Image
+    Returns:
+        Normalized tensor in range [-1, 1]
+    """
+    # Convert to numpy array
+    img_array = np.array(image).astype(np.float32)
+    # Normalize to [0, 1]
+    img_array = img_array / 255.0
+    # Normalize to [-1, 1]
+    img_array = 2.0 * img_array - 1.0
+    # Convert to tensor and rearrange to CHW format
+    tensor = torch.from_numpy(img_array).permute(2, 0, 1)
+    return tensor
+def tensor_to_image(tensor: torch.Tensor) -> Image.Image:
+    """
+    Convert PyTorch tensor to PIL Image
+    Args:
+        tensor: Tensor in range [-1, 1], shape (B, C, H, W) or (C, H, W)
+    Returns:
+        PIL Image
+    """
+    # Handle batch dimension
+    if tensor.dim() == 4:
+        tensor = tensor[0]
+    # Convert from CHW to HWC
+    img_array = tensor.cpu().numpy().transpose(1, 2, 0)
+    # Clip to valid range
+    img_array = np.clip(img_array, -1, 1)
+    # Convert from [-1, 1] to [0, 255]
+    img_array = ((img_array + 1.0) * 127.5).round().astype(np.uint8)
+    # Ensure RGB format
+    if img_array.shape[2] == 1:
+        img_array = np.repeat(img_array, 3, axis=2)
+    image = Image.fromarray(img_array)
+    return image
+def generate_prompt_hash(prompt: str) -> str:
+    """
+    Generate unique hash for a prompt
+    Args:
+        prompt: Text prompt
+    Returns:
+        Short hash string
+    """
+    hash_object = hashlib.md5(prompt.encode())
+    return hash_object.hexdigest()[:8]
+def get_model_statistics(model: torch.nn.Module) -> dict:
+    """
+    Get model parameter statistics
+    Args:
+        model: PyTorch model
+    Returns:
+        Dictionary with parameter counts
+    """
+    total_params = sum(p.numel() for p in model.parameters())
+    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    param_size = 0
+    for param in model.parameters():
+        param_size += param.numel() * param.element_size()
+    buffer_size = 0
+    for buffer in model.buffers():
+        buffer_size += buffer.numel() * buffer.element_size()
+    size_mb = (param_size + buffer_size) / 1024 ** 2
+    stats = {
+        'total_parameters': total_params,
+        'trainable_parameters': trainable_params,
+        'non_trainable_parameters': total_params - trainable_params,
+        'model_size_mb': round(size_mb, 2),
+    }
+    return stats
+def optimize_memory_usage(device: str = "cpu"):
+    """
+    Optimize memory usage for inference
+    Args:
+        device: Target device
+    """
+    import gc
+    # Clear CUDA cache if available
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    # Force garbage collection
+    gc.collect()
+    # Set memory allocator for CPU
+    if device == "cpu":
+        # Enable memory efficient attention if available
+        try:
+            import os
+            os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
+        except:
+            pass
+    print("Memory optimization applied")
+def set_seed(seed: int):
+    """
+    Set random seed for reproducibility
+    Args:
+        seed: Random seed value
+    """
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+    np.random.seed(seed)
+def validate_prompt(prompt: str) -> Tuple[bool, str]:
+    """
+    Validate and sanitize prompt
+    Args:
+        prompt: Input prompt
+    Returns:
+        Tuple of (is_valid, message)
+    """
+    if not prompt or not prompt.strip():
+        return False, "Prompt cannot be empty"
+    if len(prompt) > 1000:
+        return False, "Prompt too long (max 1000 characters)"
+    # Check for potentially harmful content
+    forbidden_terms = []
+    for term in forbidden_terms:
+        if term.lower() in prompt.lower():
+            return False, f"Prompt contains forbidden term: {term}"
+    return True, "Valid prompt"
+def create_image_grid(
+    images: List[Image.Image],
+    rows: int = None,
+    cols: int = None,
+) -> Image.Image:
+    """
+    Create a grid of images
+    Args:
+        images: List of PIL Images
+        rows: Number of rows
+        cols: Number of columns
+    Returns:
+        Grid image
+    """
+    if not images:
+        raise ValueError("No images provided")
+    num_images = len(images)
+    # Determine grid dimensions
+    if rows is None and cols is None:
+        cols = int(np.ceil(np.sqrt(num_images)))
+        rows = int(np.ceil(num_images / cols))
+    elif rows is None:
+        rows = int(np.ceil(num_images / cols))
+    elif cols is None:
+        cols = int(np.ceil(num_images / rows))
+    # Get image size (use first image as reference)
+    width, height = images[0].size
+    # Create grid image
+    grid_width = cols * width
+    grid_height = rows * height
+    grid_image = Image.new('RGB', (grid_width, grid_height), color='white')
+    # Paste images into grid
+    for i, image in enumerate(images):
+        row = i // cols
+        col = i % cols
+        x = col * width
+        y = row * height
+        grid_image.paste(image, (x, y))
+    return grid_image
+def get_device_info() -> dict:
+    """
+    Get device information
+    Returns:
+        Dictionary with device info
+    """
+    info = {
+        'cuda_available': torch.cuda.is_available(),
+        'device_count': torch.cuda.device_count() if torch.cuda.is_available() else 0,
+        'cpu_cores': __import__('os').cpu_count(),
+    }
+    if torch.cuda.is_available():
+        info['current_device'] = torch.cuda.current_device()
+        info['device_name'] = torch.cuda.get_device_name(0)
+        info['cuda_version'] = torch.version.cuda
+    return info
+class ProgressTracker:
+    """Track progress of long-running operations"""
+    def __init__(self, total: int, description: str = ""):
+        self.total = total
+        self.current = 0
+        self.description = description
+    def update(self, n: int = 1):
+        """Update progress"""
+        self.current += n
+    def get_progress(self) -> float:
+        """Get progress percentage"""
+        return (self.current / self.total) * 100 if self.total > 0 else 0
+    def __str__(self):
+        percent = self.get_progress()
+        bar_length = 30
+        filled_length = int(bar_length * self.current // self.total)
+        bar = '█' * filled_length + '-' * (bar_length - filled_length)
+        return f"{self.description}: [{bar}] {percent:.1f}% ({self.current}/{self.total})"

config.yaml ADDED Viewed

	@@ -0,0 +1,81 @@

+# Byte Dream Configuration
+model:
+  name: "Byte Dream"
+  version: "1.0.0"
+  # Model architecture parameters
+  unet:
+    in_channels: 4
+    out_channels: 4
+    block_out_channels: [320, 640, 1280, 1280]
+    layers_per_block: 2
+    attention_head_dim: 8
+    cross_attention_dim: 768
+    use_linear_projection: true
+  scheduler:
+    name: "DDIM"  # Options: DDIM, PNDM, LMSDiscrete, EulerDiscrete
+    num_train_timesteps: 1000
+    beta_start: 0.00085
+    beta_end: 0.012
+    beta_schedule: "scaled_linear"
+    clip_sample: false
+    set_alpha_to_one: false
+  vae:
+    in_channels: 3
+    out_channels: 3
+    down_block_types: ["DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D"]
+    up_block_types: ["UpDecoderBlock2D", "UpDecoderBlock2D", "UpDecoderBlock2D", "UpDecoderBlock2D"]
+    latent_channels: 4
+    sample_size: 512
+  text_encoder:
+    model: "openai/clip-vit-large-patch14"
+    max_length: 77
+# Generation parameters
+generation:
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 7.5
+  negative_prompt: "ugly, blurry, low quality, distorted, deformed"
+  seed: null  # null for random, or set integer
+# CPU Optimization
+cpu_optimization:
+  use_openvino: false
+  use_onnx: false
+  precision: "fp32"  # fp32 or fp16
+  threads: -1  # -1 for all available threads
+  memory_limit: null  # null for auto, or MB value
+# Training parameters
+training:
+  dataset_path: "./dataset"
+  output_dir: "./models/bytedream"
+  epochs: 100
+  batch_size: 4
+  gradient_accumulation_steps: 1
+  learning_rate: 1e-5
+  lr_scheduler: "constant_with_warmup"
+  lr_warmup_steps: 500
+  max_grad_norm: 1.0
+  mixed_precision: "no"  # no, fp16, bf16
+  # Data augmentation
+  random_flip: true
+  random_crop: false
+  center_crop: true
+  # Logging
+  logging_dir: "./logs"
+  log_every_n_steps: 10
+# Hugging Face
+huggingface:
+  organization: ""  # Your HF username/organization
+  private: false
+  push_to_hub: true

environment.yml ADDED Viewed

	@@ -0,0 +1,25 @@

+name: bytedream
+channels:
+  - pytorch
+  - conda-forge
+  - defaults
+dependencies:
+  - python=3.10
+  - pip
+  - pip:
+    - transformers>=4.35.0
+    - diffusers>=0.24.0
+    - torch>=2.1.0
+    - torchaudio>=2.1.0
+    - accelerate>=0.25.0
+    - numpy>=1.24.0
+    - pillow>=10.0.0
+    - opencv-python>=4.8.0
+    - safetensors>=0.4.0
+    - huggingface_hub>=0.19.0
+    - gradio>=4.0.0
+    - tqdm>=4.66.0
+    - pyyaml>=6.0
+    - matplotlib>=3.8.0
+    - scipy>=1.11.0
+    - einops>=0.7.0

examples.py ADDED Viewed

	@@ -0,0 +1,316 @@

+"""
+Byte Dream - Example Usage Scripts
+Practical examples for different use cases
+"""
+from bytedream import ByteDreamGenerator
+from pathlib import Path
+def example_basic_generation():
+    """Basic image generation example"""
+    print("\n" + "="*60)
+    print("Example 1: Basic Generation")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    # Simple prompt
+    image = generator.generate(
+        prompt="A beautiful sunset over mountains, digital art",
+    )
+    image.save("example_basic.png")
+    print("✓ Saved to: example_basic.png")
+def example_advanced_parameters():
+    """Advanced parameter tuning"""
+    print("\n" + "="*60)
+    print("Example 2: Advanced Parameters")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    # Custom parameters
+    image = generator.generate(
+        prompt="Cyberpunk city at night, neon lights, futuristic architecture",
+        negative_prompt="ugly, blurry, low quality, distorted, dark",
+        width=768,
+        height=768,
+        num_inference_steps=75,
+        guidance_scale=9.0,
+        seed=42,
+    )
+    image.save("example_advanced.png")
+    print("✓ Saved to: example_advanced.png")
+def example_batch_generation():
+    """Generate multiple images"""
+    print("\n" + "="*60)
+    print("Example 3: Batch Generation")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    prompts = [
+        "Fantasy landscape with castle and waterfall, epic scenery",
+        "Underwater coral reef, tropical fish, sunlight through water",
+        "Space nebula, colorful clouds, stars, cosmic scene",
+        "Medieval knight in armor, dramatic lighting, portrait",
+        "Japanese garden, cherry blossoms, peaceful atmosphere",
+    ]
+    images = generator.generate_batch(
+        prompts=prompts,
+        width=512,
+        height=512,
+        num_inference_steps=50,
+        guidance_scale=7.5,
+    )
+    # Save individually
+    for i, (prompt, image) in enumerate(zip(prompts, images)):
+        filename = f"batch_{i+1}.png"
+        image.save(filename)
+        print(f"✓ Saved: {filename}")
+    # Create grid
+    from bytedream.utils import create_image_grid
+    grid = create_image_grid(images)
+    grid.save("batch_grid.png")
+    print("✓ Grid saved to: batch_grid.png")
+def example_artistic_styles():
+    """Different artistic styles"""
+    print("\n" + "="*60)
+    print("Example 4: Artistic Styles")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    style_prompts = [
+        ("Oil Painting", "Portrait of a woman, oil painting style, brush strokes, classical art"),
+        ("Watercolor", "Forest landscape, watercolor painting, soft colors, artistic"),
+        ("Digital Art", "Sci-fi spaceship, digital art, concept art, highly detailed"),
+        ("Sketch", "City skyline, pencil sketch, black and white, drawing"),
+        ("Abstract", "Emotions and dreams, abstract art, colorful shapes, surreal"),
+    ]
+    for style_name, prompt in style_prompts:
+        print(f"\nGenerating {style_name}...")
+        image = generator.generate(
+            prompt=prompt,
+            num_inference_steps=50,
+            guidance_scale=7.5,
+        )
+        filename = f"style_{style_name.lower().replace(' ', '_')}.png"
+        image.save(filename)
+        print(f"✓ Saved: {filename}")
+def example_resolutions():
+    """Test different resolutions"""
+    print("\n" + "="*60)
+    print("Example 5: Different Resolutions")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    base_prompt = "Majestic mountain range, snow peaks, blue sky"
+    resolutions = [
+        (256, 256),
+        (512, 512),
+        (768, 768),
+        (512, 768),  # Portrait
+        (768, 512),  # Landscape
+    ]
+    for width, height in resolutions:
+        print(f"\nGenerating {width}x{height}...")
+        image = generator.generate(
+            prompt=base_prompt,
+            width=width,
+            height=height,
+            num_inference_steps=40,
+        )
+        filename = f"res_{width}x{height}.png"
+        image.save(filename)
+        print(f"✓ Saved: {filename}")
+def example_reproducibility():
+    """Demonstrate reproducibility with seeds"""
+    print("\n" + "="*60)
+    print("Example 6: Reproducibility with Seeds")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    prompt = "A mystical forest with glowing mushrooms, fantasy art"
+    # Generate same image twice with same seed
+    print("\nGenerating with seed=123...")
+    image1 = generator.generate(
+        prompt=prompt,
+        seed=123,
+    )
+    image1.save("repro_1.png")
+    print("Generating again with seed=123...")
+    image2 = generator.generate(
+        prompt=prompt,
+        seed=123,
+    )
+    image2.save("repro_2.png")
+    print("\nBoth images should be identical!")
+    print("✓ Check repro_1.png and repro_2.png")
+    # Generate with different seed
+    print("\nGenerating with seed=456...")
+    image3 = generator.generate(
+        prompt=prompt,
+        seed=456,
+    )
+    image3.save("repro_3.png")
+    print("This one will be different!")
+def example_negative_prompts():
+    """Using negative prompts effectively"""
+    print("\n" + "="*60)
+    print("Example 7: Negative Prompts")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    base_prompt = "Beautiful princess, elegant dress, castle background"
+    # Without negative prompt
+    print("\nWithout negative prompt...")
+    image1 = generator.generate(
+        prompt=base_prompt,
+        seed=789,
+    )
+    image1.save("no_negative.png")
+    # With negative prompt
+    print("With negative prompt...")
+    image2 = generator.generate(
+        prompt=base_prompt,
+        negative_prompt="ugly, deformed, noisy, blurry, bad anatomy, poorly drawn",
+        seed=789,
+    )
+    image2.save("with_negative.png")
+    print("\nCompare no_negative.png and with_negative.png")
+def example_quick_preview():
+    """Quick low-resolution previews"""
+    print("\n" + "="*60)
+    print("Example 8: Quick Preview Mode")
+    print("="*60)
+    generator = ByteDreamGenerator()
+    prompt = "Dragon breathing fire, epic fantasy battle scene"
+    # Quick preview
+    print("Generating quick preview (256x256, 20 steps)...")
+    preview = generator.generate(
+        prompt=prompt,
+        width=256,
+        height=256,
+        num_inference_steps=20,
+    )
+    preview.save("preview.png")
+    print("✓ Preview saved")
+    # Full resolution
+    print("\nGenerating full quality (768x768, 75 steps)...")
+    full = generator.generate(
+        prompt=prompt,
+        width=768,
+        height=768,
+        num_inference_steps=75,
+    )
+    full.save("full_quality.png")
+    print("✓ Full quality saved")
+def run_all_examples():
+    """Run all examples sequentially"""
+    print("\n" + "="*60)
+    print("Byte Dream - Complete Examples Suite")
+    print("="*60)
+    examples = [
+        example_basic_generation,
+        example_advanced_parameters,
+        example_batch_generation,
+        example_artistic_styles,
+        example_resolutions,
+        example_reproducibility,
+        example_negative_prompts,
+        example_quick_preview,
+    ]
+    for example_func in examples:
+        try:
+            example_func()
+        except Exception as e:
+            print(f"\n✗ Error in {example_func.__name__}: {e}")
+            import traceback
+            traceback.print_exc()
+        print("\n" + "-"*60)
+        input("Press Enter to continue to next example...")
+    print("\n" + "="*60)
+    print("All examples completed!")
+    print("="*60)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Byte Dream Examples")
+    parser.add_argument("--all", action="store_true", help="Run all examples")
+    parser.add_argument("--basic", action="store_true", help="Run basic example")
+    parser.add_argument("--advanced", action="store_true", help="Run advanced example")
+    parser.add_argument("--batch", action="store_true", help="Run batch generation")
+    parser.add_argument("--styles", action="store_true", help="Run artistic styles")
+    args = parser.parse_args()
+    if args.all:
+        run_all_examples()
+    elif args.basic:
+        example_basic_generation()
+    elif args.advanced:
+        example_advanced_parameters()
+    elif args.batch:
+        example_batch_generation()
+    elif args.styles:
+        example_artistic_styles()
+    else:
+        # Default: show menu
+        print("\nByte Dream Examples")
+        print("="*60)
+        print("Choose an example:")
+        print("  --basic     : Basic generation")
+        print("  --advanced  : Advanced parameters")
+        print("  --batch     : Batch generation")
+        print("  --styles    : Artistic styles")
+        print("  --all       : Run all examples")
+        print("\nOr just run without arguments to see all examples")

infer.py ADDED Viewed

	@@ -0,0 +1,150 @@

+"""
+Byte Dream - Command Line Inference Tool
+Generate images from text prompts using the command line
+"""
+import argparse
+from pathlib import Path
+import torch
+def main():
+    parser = argparse.ArgumentParser(
+        description="Byte Dream - AI Image Generation",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Basic usage
+  python infer.py --prompt "A beautiful sunset over mountains"
+  # With custom parameters
+  python infer.py --prompt "Cyberpunk city" --negative "blurry" --steps 75 --guidance 8.0
+  # Specify output and size
+  python infer.py --prompt "Fantasy landscape" --output fantasy.png --width 768 --height 768
+  # With seed for reproducibility
+  python infer.py --prompt "Dragon" --seed 42 --output dragon.png
+        """
+    )
+    parser.add_argument(
+        "--prompt", "-p",
+        type=str,
+        required=True,
+        help="Text prompt describing the desired image"
+    )
+    parser.add_argument(
+        "--negative", "-n",
+        type=str,
+        default="",
+        help="Negative prompt - things to avoid in the image"
+    )
+    parser.add_argument(
+        "--output", "-o",
+        type=str,
+        default="output.png",
+        help="Output image filename (default: output.png)"
+    )
+    parser.add_argument(
+        "--width", "-W",
+        type=int,
+        default=512,
+        help="Image width in pixels (default: 512)"
+    )
+    parser.add_argument(
+        "--height", "-H",
+        type=int,
+        default=512,
+        help="Image height in pixels (default: 512)"
+    )
+    parser.add_argument(
+        "--steps", "-s",
+        type=int,
+        default=50,
+        help="Number of inference steps (default: 50)"
+    )
+    parser.add_argument(
+        "--guidance", "-g",
+        type=float,
+        default=7.5,
+        help="Guidance scale - how closely to follow prompt (default: 7.5)"
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=None,
+        help="Random seed for reproducibility (default: random)"
+    )
+    parser.add_argument(
+        "--model", "-m",
+        type=str,
+        default=None,
+        help="Path to model directory (default: uses config)"
+    )
+    parser.add_argument(
+        "--config", "-c",
+        type=str,
+        default="config.yaml",
+        help="Path to config file (default: config.yaml)"
+    )
+    parser.add_argument(
+        "--device",
+        type=str,
+        default="cpu",
+        help="Device to run on: cpu or cuda (default: cpu)"
+    )
+    args = parser.parse_args()
+    # Import generator
+    from bytedream.generator import ByteDreamGenerator
+    # Initialize generator
+    print("="*60)
+    print("Byte Dream - AI Image Generator")
+    print("="*60)
+    generator = ByteDreamGenerator(
+        model_path=args.model,
+        config_path=args.config,
+        device=args.device,
+    )
+    # Print model info
+    info = generator.get_model_info()
+    print(f"\nModel: {info['name']} v{info['version']}")
+    print(f"Device: {info['device']}")
+    print(f"Parameters: {info['unet_parameters']}")
+    print("="*60)
+    # Generate image
+    image = generator.generate(
+        prompt=args.prompt,
+        negative_prompt=args.negative if args.negative else None,
+        width=args.width,
+        height=args.height,
+        num_inference_steps=args.steps,
+        guidance_scale=args.guidance,
+        seed=args.seed,
+    )
+    # Save image
+    output_path = Path(args.output)
+    image.save(output_path)
+    print(f"\n✓ Image saved to: {output_path.absolute()}")
+    print("="*60)
+if __name__ == "__main__":
+    main()

main.py ADDED Viewed

	@@ -0,0 +1,278 @@

+"""
+Byte Dream - Main Application Interface
+Simple Python API for image generation
+"""
+from bytedream.generator import ByteDreamGenerator
+from bytedream.utils import (
+    load_image,
+    save_image,
+    resize_image,
+    create_image_grid,
+)
+from typing import Optional, List
+from PIL import Image
+class ByteDreamApp:
+    """
+    High-level application interface for Byte Dream
+    Simplifies common tasks like image generation and batch processing
+    """
+    def __init__(
+        self,
+        model_path: Optional[str] = None,
+        device: str = "cpu",
+        verbose: bool = True,
+    ):
+        """
+        Initialize Byte Dream application
+        Args:
+            model_path: Path to model weights
+            device: Device to run on
+            verbose: Enable verbose output
+        """
+        self.verbose = verbose
+        if self.verbose:
+            print("Initializing Byte Dream Application...")
+        self.generator = ByteDreamGenerator(
+            model_path=model_path,
+            config_path="config.yaml",
+            device=device,
+        )
+        if self.verbose:
+            print("✓ Application ready!")
+    def generate(
+        self,
+        prompt: str,
+        output_path: str = "output.png",
+        negative_prompt: Optional[str] = None,
+        width: int = 512,
+        height: int = 512,
+        steps: int = 50,
+        guidance: float = 7.5,
+        seed: Optional[int] = None,
+        save: bool = True,
+    ) -> Image.Image:
+        """
+        Generate image from prompt and optionally save to file
+        Args:
+            prompt: Text description
+            output_path: Where to save the image
+            negative_prompt: What to avoid
+            width: Image width
+            height: Image height
+            steps: Inference steps
+            guidance: Guidance scale
+            seed: Random seed
+            save: Whether to save to file
+        Returns:
+            Generated PIL Image
+        """
+        # Generate image
+        image = self.generator.generate(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            width=width,
+            height=height,
+            num_inference_steps=steps,
+            guidance_scale=guidance,
+            seed=seed,
+        )
+        # Save if requested
+        if save:
+            save_image(image, output_path)
+            if self.verbose:
+                print(f"✓ Image saved to: {output_path}")
+        return image
+    def generate_multiple(
+        self,
+        prompts: List[str],
+        output_dir: str = "./outputs",
+        negative_prompt: Optional[str] = None,
+        width: int = 512,
+        height: int = 512,
+        steps: int = 50,
+        guidance: float = 7.5,
+        seeds: Optional[List[int]] = None,
+        create_grid: bool = True,
+    ) -> List[Image.Image]:
+        """
+        Generate multiple images from prompts
+        Args:
+            prompts: List of prompts
+            output_dir: Directory to save images
+            negative_prompt: Negative prompt for all
+            width: Image width
+            height: Image height
+            steps: Inference steps
+            guidance: Guidance scale
+            seeds: Seeds for each image
+            create_grid: Create grid of all images
+        Returns:
+            List of generated images
+        """
+        from pathlib import Path
+        output_path = Path(output_dir)
+        output_path.mkdir(parents=True, exist_ok=True)
+        images = []
+        for i, prompt in enumerate(prompts):
+            print(f"\n{'='*60}")
+            print(f"Generating image {i+1}/{len(prompts)}")
+            print(f"{'='*60}")
+            seed = seeds[i] if seeds else None
+            image = self.generate(
+                prompt=prompt,
+                output_path=str(output_path / f"image_{i+1:03d}.png"),
+                negative_prompt=negative_prompt,
+                width=width,
+                height=height,
+                steps=steps,
+                guidance=guidance,
+                seed=seed,
+                save=True,
+            )
+            images.append(image)
+        # Create grid
+        if create_grid and len(images) > 1:
+            grid = create_image_grid(images)
+            grid_path = output_path / "grid.png"
+            grid.save(grid_path)
+            print(f"\n✓ Grid saved to: {grid_path}")
+        return images
+    def img2img(
+        self,
+        input_image_path: str,
+        prompt: str,
+        output_path: str = "output_img2img.png",
+        strength: float = 0.75,
+        negative_prompt: Optional[str] = None,
+        steps: int = 50,
+        guidance: float = 7.5,
+        seed: Optional[int] = None,
+    ) -> Image.Image:
+        """
+        Image-to-image transformation (placeholder for future implementation)
+        Args:
+            input_image_path: Input image path
+            prompt: Transformation prompt
+            output_path: Output path
+            strength: How much to transform (0-1)
+            negative_prompt: Negative prompt
+            steps: Inference steps
+            guidance: Guidance scale
+            seed: Random seed
+        Returns:
+            Transformed image
+        """
+        print("⚠ img2img functionality will be available in a future update")
+        print("  For now, using text-to-image generation only")
+        # For now, just generate from prompt
+        return self.generate(
+            prompt=prompt,
+            output_path=output_path,
+            negative_prompt=negative_prompt,
+            steps=steps,
+            guidance=guidance,
+            seed=seed,
+        )
+    def info(self):
+        """Print model information"""
+        info = self.generator.get_model_info()
+        print("\n" + "="*60)
+        print("Byte Dream Model Information")
+        print("="*60)
+        for key, value in info.items():
+            print(f"{key.replace('_', ' ').title()}: {value}")
+        print("="*60)
+def demo():
+    """Run a quick demo"""
+    print("\n" + "="*60)
+    print("Byte Dream - Quick Demo")
+    print("="*60)
+    app = ByteDreamApp(device="cpu", verbose=True)
+    # Demo prompts
+    prompts = [
+        "A beautiful sunset over mountains, digital art, vibrant colors",
+        "Cyberpunk city at night with neon lights, futuristic",
+        "Fantasy landscape with castle and waterfall, epic",
+    ]
+    print("\nGenerating sample images...")
+    images = app.generate_multiple(
+        prompts=prompts,
+        output_dir="./demo_outputs",
+        steps=30,  # Fewer steps for demo
+        guidance=7.5,
+        create_grid=True,
+    )
+    print(f"\n✓ Demo complete! Generated {len(images)} images")
+    print("  Check ./demo_outputs/ for results")
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Byte Dream Application")
+    parser.add_argument("--demo", action="store_true", help="Run demo")
+    args = parser.parse_args()
+    if args.demo:
+        demo()
+    else:
+        # Interactive mode
+        app = ByteDreamApp()
+        print("\nByte Dream Interactive Mode")
+        print("Type 'quit' to exit\n")
+        while True:
+            prompt = input("Prompt: ").strip()
+            if prompt.lower() in ['quit', 'exit', 'q']:
+                break
+            if not prompt:
+                continue
+            try:
+                image = app.generate(
+                    prompt=prompt,
+                    output_path=f"output_{len(prompt)}.png",
+                )
+                print("✓ Image generated!\n")
+            except Exception as e:
+                print(f"Error: {e}\n")

prepare_dataset.py ADDED Viewed

	@@ -0,0 +1,287 @@

+"""
+Dataset Preparation Tool
+Prepare and preprocess image-text datasets for training
+"""
+import argparse
+from pathlib import Path
+from PIL import Image
+import json
+import shutil
+from typing import List, Tuple
+def prepare_dataset(
+    input_dir: str,
+    output_dir: str,
+    image_size: int = 512,
+    min_resolution: int = 256,
+    filter_low_quality: bool = True,
+):
+    """
+    Prepare dataset for training
+    Args:
+        input_dir: Directory with raw images
+        output_dir: Output directory for processed data
+        image_size: Target image size
+        min_resolution: Minimum acceptable resolution
+        filter_low_quality: Filter out low quality images
+    """
+    input_path = Path(input_dir)
+    output_path = Path(output_dir)
+    # Create output directories
+    output_path.mkdir(parents=True, exist_ok=True)
+    (output_path / "images").mkdir(exist_ok=True)
+    (output_path / "captions").mkdir(exist_ok=True)
+    # Find all images
+    image_extensions = ['.jpg', '.jpeg', '.png', '.webp']
+    image_files = []
+    for ext in image_extensions:
+        image_files.extend(input_path.glob(f"*{ext}"))
+        image_files.extend(input_path.glob(f"**/*{ext}"))
+    print(f"Found {len(image_files)} images")
+    # Process each image
+    processed_count = 0
+    skipped_count = 0
+    for img_file in image_files:
+        try:
+            process_image(
+                img_path=img_file,
+                output_img_path=output_path / "images" / f"{img_file.stem}.jpg",
+                caption_path=output_path / "captions" / f"{img_file.stem}.txt",
+                image_size=image_size,
+                min_resolution=min_resolution,
+                filter_low_quality=filter_low_quality,
+            )
+            processed_count += 1
+            if processed_count % 10 == 0:
+                print(f"Processed: {processed_count}/{len(image_files)}")
+        except Exception as e:
+            print(f"Error processing {img_file}: {e}")
+            skipped_count += 1
+    # Save metadata
+    metadata = {
+        'total_images': processed_count,
+        'skipped_images': skipped_count,
+        'image_size': image_size,
+        'min_resolution': min_resolution,
+    }
+    with open(output_path / "metadata.json", 'w') as f:
+        json.dump(metadata, f, indent=2)
+    print(f"\n✓ Dataset preparation complete!")
+    print(f"  Processed: {processed_count} images")
+    print(f"  Skipped: {skipped_count} images")
+    print(f"  Output: {output_path}")
+def process_image(
+    img_path: Path,
+    output_img_path: Path,
+    caption_path: Path,
+    image_size: int = 512,
+    min_resolution: int = 256,
+    filter_low_quality: bool = True,
+):
+    """
+    Process single image
+    Args:
+        img_path: Input image path
+        output_img_path: Output image path
+        caption_path: Output caption path
+        image_size: Target size
+        min_resolution: Minimum resolution
+        filter_low_quality: Filter low quality
+    """
+    # Load image
+    image = Image.open(img_path).convert('RGB')
+    # Check resolution
+    width, height = image.size
+    if width < min_resolution or height < min_resolution:
+        raise ValueError(f"Image too small: {width}x{height}")
+    # Resize if necessary
+    if min(width, height) > image_size * 1.5:
+        # Downscale large images
+        scale = image_size / max(width, height)
+        new_width = int(width * scale)
+        new_height = int(height * scale)
+        image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
+    # Center crop to square
+    size = min(image.size)
+    left = (image.size[0] - size) // 2
+    top = (image.size[1] - size) // 2
+    image = image.crop((left, top, left + size, top + size))
+    # Resize to target size
+    image = image.resize((image_size, image_size), Image.Resampling.LANCZOS)
+    # Save processed image
+    image.save(output_img_path, quality=95, optimize=True)
+    # Generate or load caption
+    caption = generate_caption(img_path)
+    with open(caption_path, 'w', encoding='utf-8') as f:
+        f.write(caption)
+def generate_caption(img_path: Path) -> str:
+    """
+    Generate caption from image filename or load from adjacent text file
+    Args:
+        img_path: Path to image
+    Returns:
+        Caption text
+    """
+    # Try to load from adjacent .txt file
+    txt_file = img_path.with_suffix('.txt')
+    if txt_file.exists():
+        with open(txt_file, 'r', encoding='utf-8') as f:
+            caption = f.read().strip()
+            if caption:
+                return caption
+    # Use filename as fallback
+    caption = img_path.stem.replace('_', ' ').replace('-', ' ')
+    # Capitalize first letter
+    caption = caption.capitalize()
+    return caption
+def create_training_splits(
+    data_dir: str,
+    train_ratio: float = 0.9,
+    val_ratio: float = 0.05,
+    test_ratio: float = 0.05,
+):
+    """
+    Create train/val/test splits
+    Args:
+        data_dir: Directory with processed data
+        train_ratio: Training set ratio
+        val_ratio: Validation set ratio
+        test_ratio: Test set ratio
+    """
+    data_path = Path(data_dir)
+    # Get all images
+    images = list((data_path / "images").glob("*.jpg"))
+    # Shuffle deterministically
+    import random
+    random.seed(42)
+    random.shuffle(images)
+    # Calculate split sizes
+    total = len(images)
+    train_size = int(total * train_ratio)
+    val_size = int(total * val_ratio)
+    # Split datasets
+    train_images = images[:train_size]
+    val_images = images[train_size:train_size + val_size]
+    test_images = images[train_size + val_size:]
+    # Save splits
+    def save_split(image_list, split_name):
+        split_data = {
+            'images': [str(img.name) for img in image_list],
+            'count': len(image_list),
+        }
+        with open(data_path / f"{split_name}.json", 'w') as f:
+            json.dump(split_data, f, indent=2)
+        print(f"{split_name}: {len(image_list)} images")
+    save_split(train_images, "train")
+    save_split(val_images, "validation")
+    save_split(test_images, "test")
+    print(f"\n✓ Created training splits")
+    print(f"  Total: {total} images")
+def main():
+    parser = argparse.ArgumentParser(description="Prepare dataset for Byte Dream training")
+    parser.add_argument(
+        "--input", "-i",
+        type=str,
+        required=True,
+        help="Input directory with raw images"
+    )
+    parser.add_argument(
+        "--output", "-o",
+        type=str,
+        default="./processed_data",
+        help="Output directory for processed data"
+    )
+    parser.add_argument(
+        "--size", "-s",
+        type=int,
+        default=512,
+        help="Target image size (default: 512)"
+    )
+    parser.add_argument(
+        "--min_res",
+        type=int,
+        default=256,
+        help="Minimum image resolution (default: 256)"
+    )
+    parser.add_argument(
+        "--no_filter",
+        action="store_true",
+        help="Disable low quality filtering"
+    )
+    parser.add_argument(
+        "--create_splits",
+        action="store_true",
+        help="Create train/val/test splits"
+    )
+    args = parser.parse_args()
+    # Prepare dataset
+    prepare_dataset(
+        input_dir=args.input,
+        output_dir=args.output,
+        image_size=args.size,
+        min_resolution=args.min_res,
+        filter_low_quality=not args.no_filter,
+    )
+    # Create splits if requested
+    if args.create_splits:
+        create_training_splits(args.output)
+if __name__ == "__main__":
+    main()

publish_to_hf.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""
+Upload Byte Dream to Hugging Face Hub
+Using login() and upload_folder() methods
+"""
+import os
+from huggingface_hub import login, upload_folder
+# Get token from command line argument or prompt
+import sys
+if len(sys.argv) > 1:
+    token = sys.argv[1]
+else:
+    token = input("Enter your Hugging Face token: ")
+# Login with your Hugging Face token
+print("Logging in to Hugging Face...")
+login(token=token)
+# Push your model files
+print("\nUploading model to Hugging Face Hub...")
+upload_folder(
+    folder_path=".",
+    repo_id="Enzo8930302/ByteDream",
+    repo_type="model"
+)
+print("\n✓ Model uploaded successfully!")
+print("📦 View your model at: https://huggingface.co/Enzo8930302/ByteDream")

quick_start.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+Quick Start Script
+Setup and test Byte Dream in one command
+"""
+import subprocess
+import sys
+from pathlib import Path
+def check_requirements():
+    """Check if requirements are installed"""
+    print("Checking requirements...")
+    required = [
+        'torch',
+        'transformers',
+        'diffusers',
+        'pillow',
+        'numpy',
+        'gradio',
+    ]
+    missing = []
+    for package in required:
+        try:
+            __import__(package.replace('-', '_'))
+            print(f"  ✓ {package}")
+        except ImportError:
+            print(f"  ✗ {package} - MISSING")
+            missing.append(package)
+    if missing:
+        print(f"\nMissing packages: {', '.join(missing)}")
+        print("\nInstall with:")
+        print("  pip install -r requirements.txt")
+        return False
+    print("\n✓ All requirements satisfied!")
+    return True
+def test_model():
+    """Test model generation"""
+    print("\n" + "="*60)
+    print("Testing Byte Dream Model")
+    print("="*60)
+    try:
+        from bytedream.generator import ByteDreamGenerator
+        print("\nInitializing generator...")
+        generator = ByteDreamGenerator(device="cpu")
+        print("\nModel info:")
+        info = generator.get_model_info()
+        for key, value in info.items():
+            print(f"  {key}: {value}")
+        print("\nGenerating test image...")
+        print("Prompt: A simple test pattern, geometric shapes")
+        image = generator.generate(
+            prompt="A simple test pattern, geometric shapes, abstract art",
+            width=256,
+            height=256,
+            num_inference_steps=20,
+            seed=42,
+        )
+        output_path = Path("test_output.png")
+        image.save(output_path)
+        print(f"\n✓ Test successful!")
+        print(f"  Image saved to: {output_path.absolute()}")
+        return True
+    except Exception as e:
+        print(f"\n✗ Error: {e}")
+        print("\nThe model needs to be trained or pretrained weights downloaded.")
+        return False
+def download_pretrained():
+    """Download pretrained model from Hugging Face"""
+    print("\n" + "="*60)
+    print("Downloading Pretrained Model")
+    print("="*60)
+    print("\nTo download a pretrained model:")
+    print("1. Visit https://huggingface.co/models")
+    print("2. Search for 'stable-diffusion' or similar")
+    print("3. Download using:")
+    print("\n   from huggingface_hub import snapshot_download")
+    print("   snapshot_download(repo_id='username/model', local_dir='./models/bytedream')")
+    print("\nOr train your own model with:")
+    print("   python train.py --train_data ./dataset --output_dir ./models/bytedream")
+def main():
+    print("="*60)
+    print("Byte Dream - Quick Start")
+    print("="*60)
+    # Check requirements
+    if not check_requirements():
+        print("\n⚠ Please install requirements first")
+        sys.exit(1)
+    # Test model
+    if test_model():
+        print("\n✓ Byte Dream is ready to use!")
+        print("\nNext steps:")
+        print("  - Run: python infer.py --prompt 'Your prompt here'")
+        print("  - Run: python app.py (for web interface)")
+        print("  - Run: python main.py --demo (for demo)")
+    else:
+        download_pretrained()
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+transformers>=4.35.0
+diffusers>=0.24.0
+torch>=2.1.0
+torchaudio>=2.1.0
+accelerate>=0.25.0
+numpy>=1.24.0
+pillow>=10.0.0
+opencv-python>=4.8.0
+safetensors>=0.4.0
+huggingface_hub>=0.19.0
+gradio>=4.0.0
+tqdm>=4.66.0
+pyyaml>=6.0
+matplotlib>=3.8.0
+scipy>=1.11.0
+einops>=0.7.0

train.py ADDED Viewed

	@@ -0,0 +1,500 @@

+"""
+Byte Dream Training Pipeline
+Complete training system for diffusion models with CPU optimization
+"""
+import os
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import Dataset, DataLoader
+from torchvision import transforms
+from PIL import Image
+import numpy as np
+from tqdm import tqdm
+import yaml
+import argparse
+from pathlib import Path
+from typing import Tuple, List, Optional
+import gc
+class ImageTextDataset(Dataset):
+    """
+    Dataset for image-text pairs
+    Supports various data augmentations for better generalization
+    """
+    def __init__(
+        self,
+        data_dir: str,
+        image_size: int = 512,
+        random_flip: bool = True,
+        random_crop: bool = False,
+        center_crop: bool = True,
+    ):
+        self.data_dir = Path(data_dir)
+        self.image_paths = list(self.data_dir.glob("*.jpg")) + \
+                          list(self.data_dir.glob("*.png")) + \
+                          list(self.data_dir.glob("*.jpeg"))
+        self.image_size = image_size
+        self.random_flip = random_flip
+        self.random_crop = random_crop
+        self.center_crop = center_crop
+        # Transformations
+        self.transform = self._get_transform()
+        # Load captions
+        self.captions = self._load_captions()
+    def _get_transform(self) -> transforms.Compose:
+        """Get image transformation pipeline"""
+        transforms_list = []
+        if self.random_crop:
+            transforms_list.append(transforms.RandomCrop(self.image_size))
+        elif self.center_crop:
+            transforms_list.append(transforms.CenterCrop(self.image_size))
+        else:
+            transforms_list.append(transforms.Resize((self.image_size, self.image_size)))
+        if self.random_flip:
+            transforms_list.append(transforms.RandomHorizontalFlip(p=0.5))
+        transforms_list.extend([
+            transforms.ToTensor(),
+            transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
+        ])
+        return transforms.Compose(transforms_list)
+    def _load_captions(self) -> dict:
+        """Load captions from text files"""
+        captions = {}
+        for img_path in self.image_paths:
+            caption_path = img_path.with_suffix('.txt')
+            if caption_path.exists():
+                with open(caption_path, 'r', encoding='utf-8') as f:
+                    captions[str(img_path)] = f.read().strip()
+            else:
+                # Use filename as caption if no text file
+                captions[str(img_path)] = img_path.stem.replace('_', ' ')
+        return captions
+    def __len__(self) -> int:
+        return len(self.image_paths)
+    def __getitem__(self, idx: int) -> dict:
+        img_path = self.image_paths[idx]
+        # Load image
+        try:
+            image = Image.open(img_path).convert('RGB')
+        except Exception as e:
+            print(f"Error loading image {img_path}: {e}")
+            return self.__getitem__((idx + 1) % len(self))
+        # Transform image
+        pixel_values = self.transform(image)
+        # Get caption
+        caption = self.captions.get(str(img_path), "")
+        return {
+            "pixel_values": pixel_values,
+            "input_ids": caption,
+            "image_path": str(img_path),
+        }
+class LatentDiffusionTrainer:
+    """
+    Trainer for latent diffusion models
+    Implements training loop with mixed precision and gradient accumulation
+    """
+    def __init__(
+        self,
+        unet: nn.Module,
+        vae: nn.Module,
+        text_encoder: nn.Module,
+        scheduler,
+        config: dict,
+        device: str = "cpu",
+    ):
+        self.unet = unet
+        self.vae = vae
+        self.text_encoder = text_encoder
+        self.scheduler = scheduler
+        self.config = config
+        self.device = torch.device(device)
+        # Training parameters
+        self.epochs = config['training']['epochs']
+        self.batch_size = config['training']['batch_size']
+        self.learning_rate = config['training']['learning_rate']
+        self.gradient_accumulation_steps = config['training']['gradient_accumulation_steps']
+        self.max_grad_norm = config['training']['max_grad_norm']
+        # Mixed precision
+        self.mixed_precision = config['training']['mixed_precision']
+        self.use_amp = self.mixed_precision != "no"
+        # Output directories
+        self.output_dir = Path(config['training']['output_dir'])
+        self.logging_dir = Path(config['training']['logging_dir'])
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        self.logging_dir.mkdir(parents=True, exist_ok=True)
+        # Initialize optimizer
+        self.optimizer = torch.optim.AdamW(
+            unet.parameters(),
+            lr=self.learning_rate,
+            betas=(0.9, 0.999),
+            weight_decay=1e-2,
+            eps=1e-08,
+        )
+        # Learning rate scheduler
+        self.lr_scheduler = self._create_lr_scheduler()
+        # Gradient scaler for mixed precision
+        self.scaler = torch.cuda.amp.GradScaler() if self.use_amp and torch.cuda.is_available() else None
+        # Move models to device
+        self._prepare_models()
+    def _prepare_models(self):
+        """Prepare models for training"""
+        print(f"Preparing models on {self.device}...")
+        self.vae.to(self.device)
+        self.text_encoder.to(self.device)
+        self.unet.to(self.device)
+        # Set VAE and text encoder to eval mode (frozen)
+        self.vae.eval()
+        if hasattr(self.text_encoder, 'model'):
+            self.text_encoder.model.eval()
+        # Freeze VAE and text encoder parameters
+        for param in self.vae.parameters():
+            param.requires_grad = False
+        if hasattr(self.text_encoder, 'model'):
+            for param in self.text_encoder.model.parameters():
+                param.requires_grad = False
+        # Set UNet to train mode
+        self.unet.train()
+    def _create_lr_scheduler(self):
+        """Create learning rate scheduler"""
+        sched_config = self.config['training']
+        if sched_config['lr_scheduler'] == "constant_with_warmup":
+            return torch.optim.lr_scheduler.ConstantLR(
+                self.optimizer,
+                factor=1.0,
+                total_iters=sched_config['lr_warmup_steps'],
+            )
+        elif sched_config['lr_scheduler'] == "linear":
+            return torch.optim.lr_scheduler.LinearLR(
+                self.optimizer,
+                start_factor=0.1,
+                end_factor=1.0,
+                total_iters=sched_config['lr_warmup_steps'],
+            )
+        else:
+            return torch.optim.lr_scheduler.ConstantLR(self.optimizer, factor=1.0)
+    def encode_images(self, images: torch.Tensor) -> torch.Tensor:
+        """Encode images to latent space"""
+        with torch.no_grad():
+            latents = self.vae.encode(images)
+            latents = latents * 0.18215  # Scale factor
+        return latents
+    def encode_text(self, texts: List[str]) -> torch.Tensor:
+        """Encode text to embeddings"""
+        with torch.no_grad():
+            text_embeddings = self.text_encoder(texts, device=self.device)
+        return text_embeddings
+    def compute_loss(
+        self,
+        latents: torch.Tensor,
+        text_embeddings: torch.Tensor,
+    ) -> torch.Tensor:
+        """
+        Compute diffusion loss
+        Args:
+            latents: Latent representations of images
+            text_embeddings: Text embeddings
+        Returns:
+            Loss value
+        """
+        batch_size = latents.shape[0]
+        # Sample random timesteps
+        timesteps = torch.randint(
+            0,
+            self.scheduler.num_train_timesteps,
+            (batch_size,),
+            device=self.device,
+        ).long()
+        # Add noise to latents
+        noise = torch.randn_like(latents)
+        noisy_latents = self.scheduler.add_noise(latents, noise, timesteps)
+        # Predict noise
+        timestep_tensor = timesteps
+        model_output = self.unet(
+            sample=noisy_latents,
+            timestep=timestep_tensor,
+            encoder_hidden_states=text_embeddings,
+        )
+        # Compute loss
+        loss = F.mse_loss(model_output, noise, reduction="mean")
+        return loss
+    def train_step(
+        self,
+        batch: dict,
+    ) -> float:
+        """
+        Perform single training step
+        Args:
+            batch: Batch of data
+        Returns:
+            Loss value
+        """
+        pixel_values = batch["pixel_values"].to(self.device)
+        input_ids = batch["input_ids"]
+        # Encode images and text
+        latents = self.encode_images(pixel_values)
+        text_embeddings = self.encode_text(input_ids)
+        # Compute loss
+        if self.use_amp and self.scaler is not None:
+            with torch.cuda.amp.autocast():
+                loss = self.compute_loss(latents, text_embeddings)
+                loss = loss / self.gradient_accumulation_steps
+            self.scaler.scale(loss).backward()
+        else:
+            loss = self.compute_loss(latents, text_embeddings)
+            loss = loss / self.gradient_accumulation_steps
+            loss.backward()
+        return loss.item() * self.gradient_accumulation_steps
+    def save_checkpoint(self, epoch: int, step: int):
+        """Save model checkpoint"""
+        checkpoint_dir = self.output_dir / f"checkpoint-{epoch}-{step}"
+        checkpoint_dir.mkdir(parents=True, exist_ok=True)
+        # Save UNet
+        torch.save({
+            'epoch': epoch,
+            'step': step,
+            'unet_state_dict': self.unet.state_dict(),
+            'optimizer_state_dict': self.optimizer.state_dict(),
+            'scheduler_state_dict': self.lr_scheduler.state_dict() if self.lr_scheduler else None,
+        }, checkpoint_dir / "pytorch_model.bin")
+        # Save config
+        with open(checkpoint_dir / "config.yaml", 'w') as f:
+            yaml.dump(self.config, f)
+        print(f"Checkpoint saved to {checkpoint_dir}")
+    def train(self, resume_from_checkpoint: Optional[str] = None):
+        """
+        Main training loop
+        Args:
+            resume_from_checkpoint: Path to checkpoint to resume from
+        """
+        # Create dataset and dataloader
+        train_config = self.config['training']
+        dataset = ImageTextDataset(
+            data_dir=train_config['dataset_path'],
+            image_size=512,
+            random_flip=train_config['random_flip'],
+            random_crop=train_config['random_crop'],
+            center_crop=train_config['center_crop'],
+        )
+        dataloader = DataLoader(
+            dataset,
+            batch_size=self.batch_size,
+            shuffle=True,
+            num_workers=0,  # CPU training
+            pin_memory=False,
+        )
+        # Resume from checkpoint
+        start_epoch = 0
+        global_step = 0
+        if resume_from_checkpoint:
+            print(f"Resuming from checkpoint: {resume_from_checkpoint}")
+            checkpoint = torch.load(resume_from_checkpoint, map_location=self.device)
+            self.unet.load_state_dict(checkpoint['unet_state_dict'])
+            self.optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
+            if checkpoint['scheduler_state_dict']:
+                self.lr_scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
+            start_epoch = checkpoint['epoch']
+            global_step = checkpoint['step']
+        # Training loop
+        total_steps = len(dataloader) * self.epochs
+        print(f"Starting training for {self.epochs} epochs...")
+        print(f"Total steps: {total_steps}")
+        print(f"Batch size: {self.batch_size}")
+        print(f"Mixed precision: {self.mixed_precision}")
+        for epoch in range(start_epoch, self.epochs):
+            self.unet.train()
+            progress_bar = tqdm(dataloader, desc=f"Epoch {epoch+1}/{self.epochs}")
+            epoch_loss = 0
+            num_steps = 0
+            for step, batch in enumerate(progress_bar):
+                # Training step
+                loss = self.train_step(batch)
+                epoch_loss += loss
+                num_steps += 1
+                # Gradient clipping and optimizer step
+                if (step + 1) % self.gradient_accumulation_steps == 0:
+                    if self.use_amp and self.scaler is not None:
+                        self.scaler.unscale_(self.optimizer)
+                        torch.nn.utils.clip_grad_norm_(
+                            self.unet.parameters(),
+                            self.max_grad_norm,
+                        )
+                        self.scaler.step(self.optimizer)
+                        self.scaler.update()
+                    else:
+                        torch.nn.utils.clip_grad_norm_(self.unet.parameters(), self.max_grad_norm)
+                        self.optimizer.step()
+                    # Learning rate scheduling
+                    if self.lr_scheduler:
+                        self.lr_scheduler.step()
+                    # Zero gradients
+                    self.optimizer.zero_grad()
+                    # Update progress bar
+                    avg_loss = epoch_loss / num_steps
+                    progress_bar.set_postfix({"loss": f"{avg_loss:.4f}"})
+                    # Logging
+                    if (global_step + 1) % self.config['training']['log_every_n_steps'] == 0:
+                        print(f"\nStep {global_step + 1}: Loss = {avg_loss:.4f}")
+                    # Save checkpoint periodically
+                    if (global_step + 1) % 1000 == 0:
+                        self.save_checkpoint(epoch, global_step)
+                global_step += 1
+            # End of epoch
+            avg_epoch_loss = epoch_loss / max(num_steps, 1)
+            print(f"\nEpoch {epoch+1} completed. Average loss: {avg_epoch_loss:.4f}")
+            # Save epoch checkpoint
+            self.save_checkpoint(epoch, global_step)
+            # Clear memory
+            gc.collect()
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+        # Save final model
+        print("\nTraining completed!")
+        self.save_final_model()
+    def save_final_model(self):
+        """Save final trained model"""
+        final_dir = self.output_dir / "final"
+        final_dir.mkdir(parents=True, exist_ok=True)
+        # Save UNet
+        torch.save({
+            'unet_state_dict': self.unet.state_dict(),
+            'config': self.config,
+        }, final_dir / "unet_pytorch_model.bin")
+        print(f"Final model saved to {final_dir}")
+def main():
+    """Main training function"""
+    parser = argparse.ArgumentParser(description="Train Byte Dream diffusion model")
+    parser.add_argument("--config", type=str, default="config.yaml", help="Path to config file")
+    parser.add_argument("--train_data", type=str, required=True, help="Path to training data")
+    parser.add_argument("--output_dir", type=str, default="./models/bytedream", help="Output directory")
+    parser.add_argument("--resume", type=str, default=None, help="Resume from checkpoint")
+    parser.add_argument("--device", type=str, default="cpu", help="Device to train on")
+    args = parser.parse_args()
+    # Load config
+    with open(args.config, 'r') as f:
+        config = yaml.safe_load(f)
+    # Override config with command line arguments
+    config['training']['dataset_path'] = args.train_data
+    config['training']['output_dir'] = args.output_dir
+    # Import model components
+    from bytedream.model import create_unet, create_vae, create_text_encoder
+    from bytedream.scheduler import create_scheduler
+    # Create components
+    print("Creating model components...")
+    unet = create_unet(config)
+    vae = create_vae(config)
+    text_encoder = create_text_encoder(config)
+    scheduler = create_scheduler(config)
+    # Count parameters
+    total_params = sum(p.numel() for p in unet.parameters())
+    print(f"UNet parameters: {total_params:,}")
+    # Create trainer
+    trainer = LatentDiffusionTrainer(
+        unet=unet,
+        vae=vae,
+        text_encoder=text_encoder,
+        scheduler=scheduler,
+        config=config,
+        device=args.device,
+    )
+    # Start training
+    trainer.train(resume_from_checkpoint=args.resume)
+if __name__ == "__main__":
+    main()

upload_to_hf.py ADDED Viewed

	@@ -0,0 +1,420 @@

+"""
+Hugging Face Integration
+Upload and deploy Byte Dream to Hugging Face Hub and Spaces
+"""
+import argparse
+from pathlib import Path
+import yaml
+def upload_to_huggingface(
+    model_path: str,
+    repo_id: str,
+    token: str = None,
+    private: bool = False,
+):
+    """
+    Upload model to Hugging Face Hub
+    Args:
+        model_path: Path to model directory
+        repo_id: Repository ID (username/model-name)
+        token: Hugging Face API token
+        private: Whether to make repository private
+    """
+    from huggingface_hub import HfApi, create_repo
+    print(f"Uploading model to Hugging Face Hub...")
+    print(f"Repository: {repo_id}")
+    # Initialize API
+    api = HfApi()
+    # Create repository
+    try:
+        create_repo(
+            repo_id=repo_id,
+            token=token,
+            private=private,
+            exist_ok=True,
+            repo_type="model",
+        )
+        print("✓ Repository created/verified")
+    except Exception as e:
+        print(f"Error creating repository: {e}")
+        return
+    # Upload model files
+    model_dir = Path(model_path)
+    if not model_dir.exists():
+        print(f"Error: Model directory {model_dir} does not exist")
+        return
+    print("\nUploading files...")
+    try:
+        # Upload entire directory
+        api.upload_folder(
+            folder_path=str(model_dir),
+            repo_id=repo_id,
+            token=token,
+            repo_type="model",
+        )
+        print("✓ Model uploaded successfully!")
+        # Print repository URL
+        print(f"\n📦 View your model at:")
+        print(f"https://huggingface.co/{repo_id}")
+    except Exception as e:
+        print(f"Error uploading model: {e}")
+def create_gradio_app():
+    """Create Gradio app for Hugging Face Spaces"""
+    gradio_code = '''"""
+Byte Dream - Gradio Web Interface
+Deploy on Hugging Face Spaces
+"""
+import gradio as gr
+from bytedream.generator import ByteDreamGenerator
+import torch
+# Initialize generator
+print("Loading Byte Dream model...")
+generator = ByteDreamGenerator(
+    model_path="./models/bytedream",
+    config_path="config.yaml",
+    device="cpu",
+)
+def generate_image(
+    prompt,
+    negative_prompt,
+    width,
+    height,
+    num_steps,
+    guidance_scale,
+    seed,
+):
+    """Generate image from prompt"""
+    # Convert seed to None if -1
+    seed_value = None if seed == -1 else seed
+    try:
+        # Generate image
+        image = generator.generate(
+            prompt=prompt,
+            negative_prompt=negative_prompt if negative_prompt else None,
+            width=int(width),
+            height=int(height),
+            num_inference_steps=int(num_steps),
+            guidance_scale=float(guidance_scale),
+            seed=seed_value,
+        )
+        return image, "Success!"
+    except Exception as e:
+        print(f"Error generating image: {e}")
+        return None, f"Error: {str(e)}"
+# Create Gradio interface
+with gr.Blocks(title="Byte Dream - AI Image Generator", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🎨 Byte Dream - AI Image Generator
+    Generate stunning images from text descriptions using advanced diffusion models.
+    Optimized for CPU inference.
+    **Tips for better results:**
+    - Be specific and descriptive in your prompts
+    - Use negative prompts to avoid unwanted elements
+    - Higher steps = better quality but slower
+    - Adjust guidance scale for creativity vs accuracy
+    """)
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### 📝 Prompt")
+            prompt_input = gr.Textbox(
+                label="Prompt",
+                placeholder="A beautiful sunset over mountains, digital art, highly detailed",
+                lines=3,
+            )
+            negative_prompt_input = gr.Textbox(
+                label="Negative Prompt (optional)",
+                placeholder="ugly, blurry, low quality, distorted",
+                lines=2,
+            )
+            with gr.Row():
+                width_slider = gr.Slider(
+                    minimum=256,
+                    maximum=1024,
+                    step=64,
+                    value=512,
+                    label="Width"
+                )
+                height_slider = gr.Slider(
+                    minimum=256,
+                    maximum=1024,
+                    step=64,
+                    value=512,
+                    label="Height"
+                )
+            with gr.Row():
+                steps_slider = gr.Slider(
+                    minimum=10,
+                    maximum=150,
+                    step=5,
+                    value=50,
+                    label="Inference Steps"
+                )
+                guidance_slider = gr.Slider(
+                    minimum=1.0,
+                    maximum=20.0,
+                    step=0.5,
+                    value=7.5,
+                    label="Guidance Scale"
+                )
+            seed_input = gr.Number(
+                label="Seed (-1 for random)",
+                value=-1,
+                precision=0,
+            )
+            generate_btn = gr.Button("🎨 Generate Image", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            gr.Markdown("### 🖼️ Generated Image")
+            output_image = gr.Image(
+                label="Generated Image",
+                type="pil",
+            )
+            status_text = gr.Textbox(label="Status")
+    # Examples
+    gr.Markdown("### 💡 Example Prompts")
+    gr.Examples(
+        examples=[
+            ["A cyberpunk city at night with neon lights, futuristic architecture, flying cars, highly detailed, digital art"],
+            ["A majestic dragon breathing fire, fantasy art, dramatic lighting, epic scene"],
+            ["A peaceful cottage in a meadow, flowers, sunny day, studio ghibli style"],
+            ["Portrait of a warrior princess, armor, fantasy, intricate details, character design"],
+            ["Underwater coral reef, tropical fish, sunlight filtering through water, photorealistic"],
+        ],
+        inputs=[prompt_input],
+    )
+    # Connect button
+    generate_btn.click(
+        fn=generate_image,
+        inputs=[
+            prompt_input,
+            negative_prompt_input,
+            width_slider,
+            height_slider,
+            steps_slider,
+            guidance_slider,
+            seed_input,
+        ],
+        outputs=[output_image, status_text],
+    )
+    gr.Markdown("""
+    ---
+    **Byte Dream** v1.0.0 | Powered by Latent Diffusion Models
+    """)
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860)
+'''
+    return gradio_code
+def create_readme_for_hf(repo_id: str):
+    """Create README for Hugging Face repository"""
+    readme = f'''---
+license: mit
+language:
+- en
+tags:
+- text-to-image
+- diffusion
+- generative-ai
+- cpu-optimized
+---
+# {repo_id.split('/')[-1]}
+{repo_id.split('/')[-1]} is a powerful text-to-image diffusion model optimized for CPU inference. Generate high-quality images from text prompts using advanced latent diffusion architecture.
+## Features
+- 🚀 **CPU Optimized**: Runs efficiently on CPU without GPU requirement
+- 🎨 **High Quality**: Generates 512x512 and higher resolution images
+- ⚡ **Fast Inference**: Optimized for speed with quality preservation
+- 🔧 **Flexible**: Supports various sampling methods and customization
+- 📦 **Easy to Use**: Simple Python API and web interface
+## Installation
+```bash
+pip install -r requirements.txt
+```
+## Usage
+### Python API
+```python
+from bytedream import ByteDreamGenerator
+# Initialize generator
+generator = ByteDreamGenerator()
+# Generate image
+image = generator.generate(
+    prompt="A beautiful sunset over mountains, digital art",
+    num_inference_steps=50,
+    guidance_scale=7.5
+)
+image.save("output.png")
+```
+### Command Line
+```bash
+python infer.py --prompt "A dragon flying over castle" --output dragon.png
+```
+### Web Interface
+```bash
+python app.py
+```
+## Model Details
+- **Architecture**: Latent Diffusion Model (UNet + VAE + Text Encoder)
+- **Parameters**: ~1.2B
+- **Training**: Trained on diverse image-text pairs
+- **Optimization**: CPU-optimized with efficient memory usage
+## Examples
+Try these prompts:
+- "Cyberpunk city at night, neon lights, futuristic"
+- "Fantasy landscape with mountains and waterfall"
+- "Portrait of a warrior, detailed armor, dramatic lighting"
+- "Abstract art, colorful, geometric shapes"
+## Configuration
+Edit `config.yaml` to customize:
+- Model architecture parameters
+- Generation settings (resolution, steps, guidance)
+- CPU optimization options
+## License
+MIT License
+## Acknowledgments
+Built with:
+- [PyTorch](https://pytorch.org/)
+- [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
+- [CLIP](https://openai.com/research/clip)
+Enjoy creating with Byte Dream! 🎨
+'''
+    return readme
+def main():
+    parser = argparse.ArgumentParser(description="Upload Byte Dream to Hugging Face")
+    parser.add_argument(
+        "--model_path",
+        type=str,
+        default="./models/bytedream",
+        help="Path to model directory"
+    )
+    parser.add_argument(
+        "--repo_id",
+        type=str,
+        required=True,
+        help="Repository ID (e.g., username/bytedream)"
+    )
+    parser.add_argument(
+        "--token",
+        type=str,
+        default=None,
+        help="Hugging Face API token"
+    )
+    parser.add_argument(
+        "--private",
+        action="store_true",
+        help="Make repository private"
+    )
+    parser.add_argument(
+        "--create_space",
+        action="store_true",
+        help="Also create Gradio Space code"
+    )
+    args = parser.parse_args()
+    # Upload model
+    upload_to_huggingface(
+        model_path=args.model_path,
+        repo_id=args.repo_id,
+        token=args.token,
+        private=args.private,
+    )
+    # Create Space files if requested
+    if args.create_space:
+        print("\n\nCreating Gradio Space files...")
+        # Save Gradio app
+        with open("app.py", 'w') as f:
+            f.write(create_gradio_app())
+        print("✓ Created app.py for Gradio Space")
+        # Save README
+        readme = create_readme_for_hf(args.repo_id)
+        with open("README_HF.md", 'w') as f:
+            f.write(readme)
+        print("✓ Created README_HF.md")
+        print("\n📋 To deploy on Hugging Face Spaces:")
+        print("1. Go to https://huggingface.co/spaces")
+        print("2. Click 'Create new Space'")
+        print("3. Choose Gradio SDK")
+        print("4. Upload all files")
+        print("5. Select CPU hardware")
+        print("6. Deploy!")
+if __name__ == "__main__":
+    main()