SenseNova-U1-8B-MoT / docs /showcases.md
Merjia's picture
Duplicate from sensenova/SenseNova-U1-8B-MoT
5b442e6
# SenseNova-U1 Showcases
[← Back to README](../README.md)
All samples below were generated by **SenseNova-U1** (see the main README
for the runnable commands). Images are stored as lossy WebP under
[`docs/assets/showcases/`](./assets/showcases/); click any thumbnail to
open the full-resolution render.
---
## Text-to-Image
The main table presents the complete n × 3 grid layouts, covering landscape, square, and portrait formats at different resolutions.
#### 🖼️ *Text-to-Image (General)*
Reproducible prompts are in
[`examples/t2i/data/samples.jsonl`](../examples/t2i/data/samples.jsonl).
| | | |
| :---: | :---: | :---: |
| [<img width="300" alt="t2i general dense face hd 07" src="./assets/showcases/t2i_general/16_9_dense_face_hd_07.webp">](./assets/showcases/t2i_general/16_9_dense_face_hd_07.webp) | [<img width="300" alt="t2i general dense text rendering 18" src="./assets/showcases/t2i_general/16_9_dense_text_rendering_18.webp">](./assets/showcases/t2i_general/16_9_dense_text_rendering_18.webp) | [<img width="300" alt="t2i general dense text rendering 12" src="./assets/showcases/t2i_general/16_9_dense_text_rendering_12.webp">](./assets/showcases/t2i_general/16_9_dense_text_rendering_12.webp) |
| [<img width="260" alt="t2i general face hd 13" src="./assets/showcases/t2i_general/1_1_face_hd_13.webp">](./assets/showcases/t2i_general/1_1_face_hd_13.webp) | [<img width="260" alt="t2i general face hd 17" src="./assets/showcases/t2i_general/1_1_face_hd_17.webp">](./assets/showcases/t2i_general/1_1_face_hd_17.webp) | [<img width="260" alt="t2i general face hd 07" src="./assets/showcases/t2i_general/1_1_dense_artistic_10.webp">](./assets/showcases/t2i_general/1_1_dense_artistic_10.webp) |
| [<img width="260" alt="t2i general landscape 06" src="./assets/showcases/t2i_general/1_1_landscape_06.webp">](./assets/showcases/t2i_general/1_1_landscape_06.webp) | [<img width="260" alt="t2i general dense landscape 12" src="./assets/showcases/t2i_general/1_1_dense_landscape_12.webp">](./assets/showcases/t2i_general/1_1_dense_landscape_12.webp) | [<img width="260" alt="t2i general landscape 07" src="./assets/showcases/t2i_general/1_1_landscape_07.webp">](./assets/showcases/t2i_general/1_1_landscape_07.webp) |
| [<img width="200" alt="t2i general portrait artistic 02 a" src="./assets/showcases/t2i_general/9_16_dense_face_hd_10.webp">](./assets/showcases/t2i_general/9_16_dense_face_hd_10.webp) | [<img width="200" alt="t2i general portrait artistic 02 b" src="./assets/showcases/t2i_general/9_16_human_pose_11.webp">](./assets/showcases/t2i_general/9_16_human_pose_11.webp) | [<img width="200" alt="t2i general portrait artistic 07" src="./assets/showcases/t2i_general/9_16_artistic_07.webp">](./assets/showcases/t2i_general/9_16_artistic_07.webp) |
| [<img width="200" alt="t2i general portrait text rendering 02" src="./assets/showcases/t2i_general/9_16_sensenova_u1_31.webp">](./assets/showcases/t2i_general/9_16_sensenova_u1_31.webp) | [<img width="200" alt="t2i general portrait dense landscape 05" src="./assets/showcases/t2i_general/9_16_dense_landscape_05.webp">](./assets/showcases/t2i_general/9_16_dense_landscape_05.webp) | [<img width="200" alt="t2i general portrait dense artistic 11" src="./assets/showcases/t2i_general/9_16_dense_artistic_11.webp">](./assets/showcases/t2i_general/9_16_dense_artistic_11.webp) |
#### 🖼️ *Text-to-Image (Reasoning)*
Reproducible prompts are in
[`examples/t2i/data/samples_reasoning.jsonl`](../examples/t2i/data/samples_reasoning.jsonl).
<table>
<tr>
<th style="width: 20%">Original Text</th>
<th style="width: 50%">Reasoning Process</th>
<th style="width: 30%">Resulting Image</th>
</tr>
<tr>
<td style="vertical-align: top;">The playful craft that embodies Russian cultural charm</td>
<td><div style="max-height: 200px; overflow-y: auto;">1. <b>Instruction Understanding:</b> The core subject is the matryoshka, identified as a Russian craft. Essential modifiers include playful and Russian cultural charm. The relation is that the craft embodies the charm. 2. <b>Reasoning Process:</b> The prompt identifies the matryoshka, the iconic Russian wooden doll set. 3. <b>Establish the frame:</b> The frame captures the matryoshka set in the foreground. The composition focuses on the Russian craft to show the playful nature of the doll set. 4. <b>Set the lighting and color:</b> Lighting illuminates the matryoshka to reveal the Russian cultural charm. The color palette supports the playful craft aesthetic. 5. <b>Lock the style:</b> The style emphasizes the wooden nature of the Russian craft. The finish reflects the charm of the matryoshka. 6. <b>Explicit Prompt:</b> A set of three colorful, hand-painted wooden matryoshka dolls arranged by size on a rustic wooden table, bright floral patterns, soft natural light.</div></td>
<td style="vertical-align: top;"><img src="./assets/showcases/t2i_reasoning/1_reasoning.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">A typical dish from the country where Naples is located</td>
<td><div style="max-height: 200px; overflow-y: auto;">1. <b>Instruction Understanding:</b> The core subject is a Neapolitan pizza presented as a typical dish. The context specifies Naples, Italy, as the country of origin for this food item. 2. <b>Reasoning Process:</b> Naples is in Italy, and a classic dish is a Neapolitan pizza. 3. <b>Establish the frame:</b> The Neapolitan pizza is captured in a close-up shot that fills the central frame. The angle is slightly elevated to show the round form of the dish clearly. 4. <b>Set the lighting and color:</b> Soft lighting illuminates the surface of the Neapolitan pizza to reveal texture. Warm tones dominate the color palette, emphasizing the baked nature of the dish. 5. <b>Lock the style:</b> The image utilizes a realistic photographic style with sharp focus on the main subject. The finish appears natural and appetizing, suitable for food documentation. 6. <b>Explicit Prompt:</b> A delicious Neapolitan pizza with a soft, charred crust, tomato sauce, and fresh mozzarella, served on a rustic wooden table, realistic food photography.</div></td>
<td style="vertical-align: top;"><img src="./assets/showcases/t2i_reasoning/2_reasoning.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">A gigantic bubble in the immediate foreground with a small town barely visible inside</td>
<td><div style="max-height: 200px; overflow-y: auto;">1. <b>Instruction Understanding:</b> The prompt requires a gigantic bubble positioned in the immediate foreground. A small town must be visible inside the bubble. 2. <b>Reasoning Process:</b> The bubble acts as a transparent lens, potentially distorting the town's appearance due to its curvature. 3. <b>Establish the frame:</b> The shot scale is an extreme close-up, placing the bubble so close it fills the majority of the frame. The focus plane is set on the bubble's surface, rendering the background outside the town soft and out of focus. 4. <b>Set the lighting and color:</b> Natural daylight creates bright specular highlights on the upper surface of the bubble. The color palette is dominated by the cool transparency of the bubble and the muted tones of the town within. 5. <b>Lock the style:</b> The overall image style is photorealistic with a high level of detail on the bubble's texture. The rendering captures the glossy finish and physical properties of the bubble accurately. 6. <b>Explicit Prompt:</b> An extreme close-up of a massive, transparent soap bubble floating in the immediate foreground, with a tiny, detailed miniature town visible inside the curvature, refracted and magnified by the surface tension, soft natural daylight, dreamlike, high detail, photorealistic texture.</div></td>
<td style="vertical-align: top;"><img src="./assets/showcases/t2i_reasoning/3_reasoning.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">A chocolate bar left in direct sunlight, highlighting the state of the chocolate</td>
<td><div style="max-height: 200px; overflow-y: auto;">1. <b>Instruction Understanding:</b> The core subject is a chocolate bar positioned in direct sunlight. The focus is on the state of the chocolate, specifically how the heat affects it. 2. <b>Reasoning Process:</b> Heat causes chocolate to melt, losing its structured shape and becoming a viscous, glossy liquid. 3. <b>Establish the frame:</b> The composition is a close-up shot that fills the frame with the chocolate bar to emphasize detail. The angle is slightly elevated to show the top surface and the pooling liquid clearly. 4. <b>Build the environment:</b> The chocolate bar rests on a generic surface that supports the object without distracting from the main subject. The background is blurred to keep attention on the foreground elements and the chocolate. 5. <b>Set the lighting and color:</b> Direct sunlight creates bright highlights on the melting chocolate, emphasizing its glossy texture. The lighting is warm and intense, casting distinct shadows and illuminating the rich brown colors of the liquid. 6. <b>Explicit Prompt:</b> A close-up of a melting chocolate bar on a surface, with the edges losing their defined shape and pooling into a glossy, viscous puddle under the heat of the sun.</div></td>
<td style="vertical-align: top;"><img src="./assets/showcases/t2i_reasoning/6_reasoning.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">A solution of calcium carbonate reacting with acetic acid</td>
<td><div style="max-height: 200px; overflow-y: auto;">1. <b>Instruction Understanding:</b> The core subject is a solution of calcium carbonate and acetic acid. The prompt specifies the reacting state of the chemical mixture. 2. <b>Reasoning Process:</b> The reaction produces carbon dioxide gas, which would be visible as a steady stream of bubbles rising through the liquid. 3. <b>Establish the frame:</b> The camera frames the solution closely to capture the details of the reaction. The composition centers on the liquid where the gas is visible. 4. <b>Set the lighting and color:</b> The liquid appears clear, allowing the white bubbles to stand out distinctly. The lighting is bright and even to illuminate the stream of gas. 5. <b>Lock the style:</b> The image maintains a realistic photographic style suitable for scientific observation. The focus is sharp on the reacting solution and bubbles. 6. <b>Explicit Prompt:</b> A test tube filled with a clear liquid and a rapid, effervescent stream of carbon dioxide bubbles rising to the surface, laboratory experiment.</div></td>
<td style="vertical-align: top;"><img src="./assets/showcases/t2i_reasoning/7_reasoning.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
</table>
#### 🖼️ *Text-to-Image (Infographics)*
Reproducible prompts are in
[`examples/t2i/data/samples_infographic.jsonl`](../examples/t2i/data/samples_infographic.jsonl).
<table align="center">
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0004.webp"><img width="300" alt="t2i landscape 0001" src="./assets/showcases/t2i_infographic/0004.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0012.webp"><img width="300" alt="t2i landscape 0002" src="./assets/showcases/t2i_infographic/0012.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0005.webp"><img width="300" alt="t2i landscape 0003" src="./assets/showcases/t2i_infographic/0005.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0018.webp"><img width="300" alt="t2i landscape 0004" src="./assets/showcases/t2i_infographic/0018.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0024.webp"><img width="300" alt="t2i landscape 0005" src="./assets/showcases/t2i_infographic/0024.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0013.webp"><img width="300" alt="t2i landscape 0006" src="./assets/showcases/t2i_infographic/0013.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0006.webp"><img width="300" alt="t2i landscape 0007" src="./assets/showcases/t2i_infographic/0006.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0015.webp"><img width="300" alt="t2i landscape 0008" src="./assets/showcases/t2i_infographic/0015.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0025.webp"><img width="300" alt="t2i landscape 0009" src="./assets/showcases/t2i_infographic/0025.webp"></a></td>
</tr>
</table>
<table align="center">
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0000.webp"><img width="220" alt="t2i landscape 0010" src="./assets/showcases/t2i_infographic/0000.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0003.webp"><img width="220" alt="t2i landscape 0011" src="./assets/showcases/t2i_infographic/0003.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0001.webp"><img width="220" alt="t2i landscape 0012" src="./assets/showcases/t2i_infographic/0001.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0022.webp"><img width="220" alt="t2i landscape 0013" src="./assets/showcases/t2i_infographic/0022.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0016.webp"><img width="220" alt="t2i image 0014" src="./assets/showcases/t2i_infographic/0016.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0010.webp"><img width="220" alt="t2i image 0015" src="./assets/showcases/t2i_infographic/0010.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0007.webp"><img width="220" alt="t2i image 0016" src="./assets/showcases/t2i_infographic/0007.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0021.webp"><img width="220" alt="t2i image 0017" src="./assets/showcases/t2i_infographic/0021.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0009.webp"><img width="220" alt="t2i image 0018" src="./assets/showcases/t2i_infographic/0009.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0020.webp"><img width="220" alt="t2i image 0019" src="./assets/showcases/t2i_infographic/0020.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0008.webp"><img width="220" alt="t2i image 0020" src="./assets/showcases/t2i_infographic/0008.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0002.webp"><img width="220" alt="t2i image 0021" src="./assets/showcases/t2i_infographic/0002.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0011.webp"><img width="230" alt="t2i image 0022" src="./assets/showcases/t2i_infographic/0011.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0023.webp"><img width="230" alt="t2i image 0023" src="./assets/showcases/t2i_infographic/0023.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0027.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0027.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0026.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0026.webp"></a></td>
</tr>
<tr>
<td align="center"><a href="./assets/showcases/t2i_infographic/0029.webp"><img width="230" alt="t2i image 0022" src="./assets/showcases/t2i_infographic/0029.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0030.webp"><img width="230" alt="t2i image 0023" src="./assets/showcases/t2i_infographic/0030.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0031.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0031.webp"></a></td>
<td align="center"><a href="./assets/showcases/t2i_infographic/0032.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0032.webp"></a></td>
</tr>
</table>
## Image Editing
Side-by-side compare montages below show `input(s) | output`, with the
edit instruction rendered along the bottom. The same
unified model handles single-image attribute / style / relighting edits
and multi-reference (subject + accessory + pose) composition.
#### ✏️ *Image Editing (General)*
Reproducible prompts are in
[`examples/editing/data/samples.jsonl`](../examples/editing/data/samples.jsonl).
| | |
| :---: | :---: |
| <div align="center"><a href="../examples/editing/data/images/1.webp"><img width="180" alt="editing input 1" src="../examples/editing/data/images/1.webp"></a> <a href="../docs/assets/showcases/editing/1_out.webp"><img width="180" alt="editing output 1" src="../docs/assets/showcases/editing/1_out.webp"></a><br><sub>Change the jacket of the person on the left to bright yellow.</sub></div> | <div align="center"><a href="../examples/editing/data/images/3.webp"><img width="180" alt="editing input 3" src="../examples/editing/data/images/3.webp"></a> <a href="../docs/assets/showcases/editing/3_out.webp"><img width="180" alt="editing output 3" src="../docs/assets/showcases/editing/3_out.webp"></a><br><sub>在小狗头上放一个花环,并且把图片变为吉卜力风格。</sub></div> |
| <div align="center"><a href="../examples/editing/data/images/2.webp"><img width="180" alt="editing input 2" src="../examples/editing/data/images/2.webp"></a> <a href="../docs/assets/showcases/editing/2_out.webp"><img width="180" alt="editing output 2" src="../docs/assets/showcases/editing/2_out.webp"></a><br><sub>Make the person in the image smile.</sub></div> | <div align="center"><a href="../examples/editing/data/images/4.webp"><img width="180" alt="editing input 4" src="../examples/editing/data/images/4.webp"></a> <a href="../docs/assets/showcases/editing/4_out.webp"><img width="180" alt="editing output 4" src="../docs/assets/showcases/editing/4_out.webp"></a><br><sub>Add a bouquet of flowers.</sub></div> |
| <div align="center"><a href="../examples/editing/data/images/5.webp"><img width="180" alt="editing input 5" src="../examples/editing/data/images/5.webp"></a> <a href="../docs/assets/showcases/editing/5_out.webp"><img width="180" alt="editing output 5" src="../docs/assets/showcases/editing/5_out.webp"></a><br><sub>Turn the image into an American comic style.</sub></div> | <div align="center"><a href="../examples/editing/data/images/8.webp"><img width="180" alt="editing input 8" src="../examples/editing/data/images/8.webp"></a> <a href="../docs/assets/showcases/editing/8_out.webp"><img width="180" alt="editing output 8" src="../docs/assets/showcases/editing/8_out.webp"></a><br><sub>Replace the man with a woman.</sub></div> |
| <div align="center"><a href="../examples/editing/data/images/6.webp"><img width="180" alt="editing input 6" src="../examples/editing/data/images/6.webp"></a> <a href="../docs/assets/showcases/editing/6_out.webp"><img width="180" alt="editing output 6" src="../docs/assets/showcases/editing/6_out.webp"></a><br><sub>Replace the text "WARFIGHTER" to "BATTLEFIELD" in the bold orange-red font.</sub></div> | <div align="center"><a href="../examples/editing/data/images/7.webp"><img width="180" alt="editing input 7" src="../examples/editing/data/images/7.webp"></a> <a href="../docs/assets/showcases/editing/7_out.webp"><img width="180" alt="editing output 7" src="../docs/assets/showcases/editing/7_out.webp"></a><br><sub>Remove the person on the far right wearing a green skirt and a green top.</sub></div> |
#### ✏️ *Image Editing (Reasoning)*
Reproducible prompts are in
[`examples/editing/data/samples_reasoning.jsonl`](../examples/editing/data/samples_reasoning.jsonl).
<table>
<tr>
<th style="width: 20%">Original Text</th>
<th style="width: 30%">Original Image</th>
<th style="width: 20%">Reasoning Process</th>
<th style="width: 30%">Resulting Image</th>
</tr>
<tr>
<td style="vertical-align: top;">Draw what it will look like one hour later.</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/034_temporal_reasoning_draw_what_it_will_look_like.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows a glass cup of hot tea with steeping tea leaves, and the water appears relatively clear. 2. <b>Instruction Understanding:</b> The instruction asks for the appearance one hour later. 3. <b>Reasoning Process:</b> Over time, tannins and pigments leach out, making the tea noticeably darker and more uniformly colored, and the leaves may look more swollen and darker from soaking. 4. <b>Expected Visual Changes:</b> The expected visible result is a deeper amber-to-brown tea color and more fully saturated liquid. 5. <b>Elements to Preserve:</b> The glass cup, scattered leaves around it, background, and camera angle should remain unchanged. 6. <b>Explicit Edit Prompt:</b> Edit the tea liquid to a much darker, more saturated amber-brown color as if fully steeped, and make the tea leaves look slightly darker and more swollen, while keeping the glass cup, surrounding leaves, background, and framing unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/034_temporal_reasoning_draw_what_it_will_look_like_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">Draw what it will look like immediately after someone stands up from sitting on it for a long time.</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/036_causal_reasoning_draw_what_it_will_look_like.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows a fluffy lime-green beanbag chair that looks evenly plump and undisturbed on a white background. 2. <b>Instruction Understanding:</b> The edit instruction asks for its appearance immediately after someone stood up from sitting there for a long time. 3. <b>Reasoning Process:</b> Prolonged weight compresses the fabric and internal fill, leaving a depressed seat area, wrinkles radiating outward, and a slowly recovering shape. 4. <b>Expected Visual Changes:</b> The visible result should be a noticeable dip and creasing where a person was seated. 5. <b>Elements to Preserve:</b> The background, beanbag color, lighting, and camera angle should remain unchanged while only the beanbag’s shape shows the compression. 6. <b>Explicit Edit Prompt:</b> Edit the beanbag chair to show a clear seated depression in the center with surrounding wrinkles and slightly compressed fabric, while keeping the white background, lighting, and camera angle unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/036_causal_reasoning_draw_what_it_will_look_like_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">Draw an image showing the side view of the provided traffic cone.</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/039_spatial_reasoning_draw_an_image_showing_the_si.png" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows a 3D perspective view of a traffic cone. 2. <b>Instruction Understanding:</b> The instruction asks for a side view. 3. <b>Reasoning Process:</b> A side view of a standard traffic cone results in a triangular silhouette with a flat rectangular base. 4. <b>Expected Visual Changes:</b> The perspective is flattened into this 2D-like geometric profile. 5. <b>Elements to Preserve:</b> The cone's height and color should remain consistent with the original object. 6. <b>Explicit Edit Prompt:</b> Edit the perspective view into a flat side-profile silhouette of a triangle with a rectangular base, keeping the red color and proportions unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/039_spatial_reasoning_draw_an_image_showing_the_si_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">Change the water to high-concentration saltwater</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/042_physics_change_the_water_to_high-con.jpg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows an egg resting at the bottom of a glass of water. 2. <b>Instruction Understanding:</b> The instruction asks to change the medium to high-concentration saltwater. 3. <b>Reasoning Process:</b> Saltwater is denser than fresh water, which increases the buoyant force on the egg. 4. <b>Expected Visual Changes:</b> As density increases, the egg will overcome gravity and float higher or suspend in the middle of the liquid. 5. <b>Elements to Preserve:</b> The glass and the egg's appearance should remain consistent, focusing on the shift in the egg's vertical position. 6. <b>Explicit Edit Prompt:</b> Edit the position of the egg so it is floating in the middle of the liquid instead of resting on the bottom, while keeping the glass and the egg's appearance unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/042_physics_change_the_water_to_high-con_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">What the fruit looks like when ripe in the picture</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/044_biology_what_the_fruit_looks_like_wh.jpg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows green, unripe bananas. 2. <b>Instruction Understanding:</b> The instruction asks for the appearance of the fruit when ripe. 3. <b>Reasoning Process:</b> Ripening involves a breakdown of chlorophyll and the production of sugars, which turns the skin from green to yellow and often causes small brown sugar spots to appear. 4. <b>Expected Visual Changes:</b> The color and texture of the peel should transition to a ripe state. 5. <b>Elements to Preserve:</b> The shape of the bananas and the white background should remain constant. 6. <b>Explicit Edit Prompt:</b> Edit the green bananas to be bright yellow with small brown spots, while keeping the original shape and white background unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/044_biology_what_the_fruit_looks_like_wh_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">Correct the unreasonable part in the image.</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/046_anomaly_correction_correct_the_unreasonable_par.jpg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows a kettle pouring water onto a mug, but the stream is misaligned and missing the cup. 2. <b>Instruction Understanding:</b> The instruction asks to fix the physical inconsistency. 3. <b>Reasoning Process:</b> The water stream must be redirected to connect the spout to the mug, maintaining the trajectory of gravity. 4. <b>Expected Visual Changes:</b> The water stream will be redirected to connect the spout to the mug. 5. <b>Elements to Preserve:</b> The kettle, mug, and background must remain unchanged while the water path is corrected. 6. <b>Explicit Edit Prompt:</b> Draw a continuous water stream connecting the kettle spout to the mug, keeping the kettle, mug, and background unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/046_anomaly_correction_correct_the_unreasonable_par_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
<tr>
<td style="vertical-align: top;">Modify the matrix in the image to an upper triangular matrix</td>
<td style="vertical-align: top;"><img src="../examples/editing/data/images_reasoning/047_mathematics_modify_the_matrix_in_the_ima.jpg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
<td><div style="max-height: 200px; overflow-y: auto;">
1. <b>Source Image Analysis:</b> The source image shows a 2x2 matrix with values 1, 2, 3, and 4. 2. <b>Instruction Understanding:</b> The instruction asks to convert this to an upper triangular matrix. 3. <b>Reasoning Process:</b> By definition, an upper triangular matrix has zeros below the main diagonal, so the entry '3' must be changed to '0' while keeping '1', '2', and '4' as they are, and this modification satisfies the mathematical property requested. 4. <b>Expected Visual Changes:</b> The entry '3' in the lower-left position will be changed to '0'. 5. <b>Elements to Preserve:</b> The grid lines, the matrix structure, and the other entries must remain unchanged. 6. <b>Explicit Edit Prompt:</b> Change the '3' in the lower-left position to '0', while keeping the matrix structure and other entries unchanged.</div></td>
<td style="vertical-align: top;"><img src="../docs/assets/showcases/editing/047_mathematics_modify_the_matrix_in_the_ima_result.jpeg" style="max-width: 100%; max-height: 100%; object-fit: contain;"></td>
</tr>
</table>
---
## Interleaved Generation
Each case below is a single rendered response from `model.interleave_gen`:
the model first runs a `<think>...</think>` reasoning block that produces
intermediate images, then emits the final interleaved text-and-image
answer.
#### ♻️ *Interleaved Generation (General)*
Reproducible prompts are in
[`examples/interleave/data/samples.jsonl`](../examples/interleave/data/samples.jsonl).
All examples are generated with think-mode reasoning; the chain-of-thought is omitted in some cases for cleaner visualization.
| |
| :---: |
| [<img alt="interleave case 03" src="./assets/showcases/interleave/case_0003_beachfront_villa.webp">](./assets/showcases/interleave/case_0003_beachfront_villa.webp) |
| [<img alt="interleave case 04" src="./assets/showcases/interleave/case_0004_scented_candle_promo.webp">](./assets/showcases/interleave/case_0004_scented_candle_promo.webp) |
| [<img alt="interleave case 05" src="./assets/showcases/interleave/case_0005_matchgirl_warm_au.webp">](./assets/showcases/interleave/case_0005_matchgirl_warm_au.webp) |
| [<img alt="interleave case 06" src="./assets/showcases/interleave/case_0006_orange_cat_travel.webp">](./assets/showcases/interleave/case_0006_orange_cat_travel.webp) |
| [<img alt="interleave case 01" src="./assets/showcases/interleave/case_0001_makeup_three_looks.webp">](./assets/showcases/interleave/case_0001_makeup_three_looks.webp) |
| [<img alt="interleave case 07" src="./assets/showcases/interleave/case_0007_bowie_slide_design.webp">](./assets/showcases/interleave/case_0007_bowie_slide_design.webp) |
#### ♻️ *Interleaved Generation (Reasoning)*
Reproducible prompts are in
[`examples/interleave/data/samples_reasoning.jsonl`](../examples/interleave/data/samples_reasoning.jsonl).
| |
| :---: |
| [<img alt="interleave case 05" src="./assets/showcases/interleave/reasoning.png">](./assets/showcases/interleave/reasoning.png) |
---
## Visual Understanding
General visual understanding across spatial reasoning, multi-image comparison, OCR, geometry, and knowledge-intensive QA:
#### 📝 *Visual Understanding (General)*
Reproducible prompts are in [`examples/vqa/data/samples.jsonl`](../examples/vqa/data/samples.jsonl).
| |
| :---: |
| [<img alt="vqa general cases" src="./assets/showcases/vqa/general_case_all.webp">](./assets/showcases/vqa/general_case_all.webp) |
#### 📝 *Visual Understanding (Agentic)*
| |
| :---: |
| [<img alt="vqa agentic case" src="./assets/showcases/vqa/agentic_case.webp">](./assets/showcases/vqa/agentic_case.webp) |
| [<img alt="vqa agentic case 2" src="./assets/showcases/vqa/agentic_case_2.webp">](./assets/showcases/vqa/agentic_case_2.webp) |