Image-Text-to-Text
Transformers
Diffusers
Safetensors
qwen3_vl
vision-language-model
image-decomposition
conversational
Instructions to use SynLayers/Bbox-caption-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SynLayers/Bbox-caption-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="SynLayers/Bbox-caption-8b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("SynLayers/Bbox-caption-8b") model = AutoModelForImageTextToText.from_pretrained("SynLayers/Bbox-caption-8b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SynLayers/Bbox-caption-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SynLayers/Bbox-caption-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/SynLayers/Bbox-caption-8b
- SGLang
How to use SynLayers/Bbox-caption-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SynLayers/Bbox-caption-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SynLayers/Bbox-caption-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SynLayers/Bbox-caption-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use SynLayers/Bbox-caption-8b with Docker Model Runner:
docker model run hf.co/SynLayers/Bbox-caption-8b
| language: | |
| - en | |
| license: other | |
| license_name: flux-1-dev-non-commercial-license | |
| license_link: LICENSE.md | |
| extra_gated_prompt: By clicking "Agree", you agree to the [FluxDev Non-Commercial License Agreement](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) | |
| and acknowledge the [Acceptable Use Policy](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/POLICY.md). | |
| tags: | |
| - text-to-image | |
| - image-generation | |
| - flux | |
| ![FLUX.1 [dev] Grid](./dev_grid.jpg) | |
| `FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. | |
| For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/). | |
| # Key Features | |
| 1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`. | |
| 2. Competitive prompt following, matching the performance of closed source alternatives . | |
| 3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient. | |
| 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. | |
| 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md). | |
| # Usage | |
| We provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux). | |
| Developers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point. | |
| ## API Endpoints | |
| The FLUX.1 models are also available via API from the following sources | |
| - [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`) | |
| - [replicate.com](https://replicate.com/collections/flux) | |
| - [fal.ai](https://fal.ai/models/fal-ai/flux/dev) | |
| - [mystic.ai](https://www.mystic.ai/black-forest-labs/flux1-dev) | |
| ## ComfyUI | |
| `FLUX.1 [dev]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow. | |
| ## Diffusers | |
| To use `FLUX.1 [dev]` with the 🧨 diffusers python library, first install or upgrade diffusers | |
| ```shell | |
| pip install -U diffusers | |
| ``` | |
| Then you can use `FluxPipeline` to run the model | |
| ```python | |
| import torch | |
| from diffusers import FluxPipeline | |
| pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16) | |
| pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power | |
| prompt = "A cat holding a sign that says hello world" | |
| image = pipe( | |
| prompt, | |
| height=1024, | |
| width=1024, | |
| guidance_scale=3.5, | |
| num_inference_steps=50, | |
| max_sequence_length=512, | |
| generator=torch.Generator("cpu").manual_seed(0) | |
| ).images[0] | |
| image.save("flux-dev.png") | |
| ``` | |
| To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation | |
| --- | |
| # Limitations | |
| - This model is not intended or able to provide factual information. | |
| - As a statistical model this checkpoint might amplify existing societal biases. | |
| - The model may fail to generate output that matches the prompts. | |
| - Prompt following is heavily influenced by the prompting-style. | |
| # Out-of-Scope Use | |
| The model and its derivatives may not be used | |
| - In any way that violates any applicable national, federal, state, local or international law or regulation. | |
| - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. | |
| - To generate or disseminate verifiably false information and/or content with the purpose of harming others. | |
| - To generate or disseminate personal identifiable information that can be used to harm an individual. | |
| - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. | |
| - To create non-consensual nudity or illegal pornographic content. | |
| - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. | |
| - Generating or facilitating large-scale disinformation campaigns. | |
| # License | |
| This model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md). |