Duplicate from sensenova/SenseNova-U1-8B-MoT

5b442e6 5 days ago

33.3 kB

SenseNova-U1 Showcases

All samples below were generated by SenseNova-U1 (see the main README for the runnable commands). Images are stored as lossy WebP under docs/assets/showcases/; click any thumbnail to open the full-resolution render.

Text-to-Image

The main table presents the complete n × 3 grid layouts, covering landscape, square, and portrait formats at different resolutions.

🖼️ Text-to-Image (General)

Reproducible prompts are in examples/t2i/data/samples.jsonl.

🖼️ Text-to-Image (Reasoning)

Reproducible prompts are in examples/t2i/data/samples_reasoning.jsonl.

Original Text	Reasoning Process	Resulting Image
The playful craft that embodies Russian cultural charm	1. Instruction Understanding: The core subject is the matryoshka, identified as a Russian craft. Essential modifiers include playful and Russian cultural charm. The relation is that the craft embodies the charm. 2. Reasoning Process: The prompt identifies the matryoshka, the iconic Russian wooden doll set. 3. Establish the frame: The frame captures the matryoshka set in the foreground. The composition focuses on the Russian craft to show the playful nature of the doll set. 4. Set the lighting and color: Lighting illuminates the matryoshka to reveal the Russian cultural charm. The color palette supports the playful craft aesthetic. 5. Lock the style: The style emphasizes the wooden nature of the Russian craft. The finish reflects the charm of the matryoshka. 6. Explicit Prompt: A set of three colorful, hand-painted wooden matryoshka dolls arranged by size on a rustic wooden table, bright floral patterns, soft natural light.
A typical dish from the country where Naples is located	1. Instruction Understanding: The core subject is a Neapolitan pizza presented as a typical dish. The context specifies Naples, Italy, as the country of origin for this food item. 2. Reasoning Process: Naples is in Italy, and a classic dish is a Neapolitan pizza. 3. Establish the frame: The Neapolitan pizza is captured in a close-up shot that fills the central frame. The angle is slightly elevated to show the round form of the dish clearly. 4. Set the lighting and color: Soft lighting illuminates the surface of the Neapolitan pizza to reveal texture. Warm tones dominate the color palette, emphasizing the baked nature of the dish. 5. Lock the style: The image utilizes a realistic photographic style with sharp focus on the main subject. The finish appears natural and appetizing, suitable for food documentation. 6. Explicit Prompt: A delicious Neapolitan pizza with a soft, charred crust, tomato sauce, and fresh mozzarella, served on a rustic wooden table, realistic food photography.
A gigantic bubble in the immediate foreground with a small town barely visible inside	1. Instruction Understanding: The prompt requires a gigantic bubble positioned in the immediate foreground. A small town must be visible inside the bubble. 2. Reasoning Process: The bubble acts as a transparent lens, potentially distorting the town's appearance due to its curvature. 3. Establish the frame: The shot scale is an extreme close-up, placing the bubble so close it fills the majority of the frame. The focus plane is set on the bubble's surface, rendering the background outside the town soft and out of focus. 4. Set the lighting and color: Natural daylight creates bright specular highlights on the upper surface of the bubble. The color palette is dominated by the cool transparency of the bubble and the muted tones of the town within. 5. Lock the style: The overall image style is photorealistic with a high level of detail on the bubble's texture. The rendering captures the glossy finish and physical properties of the bubble accurately. 6. Explicit Prompt: An extreme close-up of a massive, transparent soap bubble floating in the immediate foreground, with a tiny, detailed miniature town visible inside the curvature, refracted and magnified by the surface tension, soft natural daylight, dreamlike, high detail, photorealistic texture.
A chocolate bar left in direct sunlight, highlighting the state of the chocolate	1. Instruction Understanding: The core subject is a chocolate bar positioned in direct sunlight. The focus is on the state of the chocolate, specifically how the heat affects it. 2. Reasoning Process: Heat causes chocolate to melt, losing its structured shape and becoming a viscous, glossy liquid. 3. Establish the frame: The composition is a close-up shot that fills the frame with the chocolate bar to emphasize detail. The angle is slightly elevated to show the top surface and the pooling liquid clearly. 4. Build the environment: The chocolate bar rests on a generic surface that supports the object without distracting from the main subject. The background is blurred to keep attention on the foreground elements and the chocolate. 5. Set the lighting and color: Direct sunlight creates bright highlights on the melting chocolate, emphasizing its glossy texture. The lighting is warm and intense, casting distinct shadows and illuminating the rich brown colors of the liquid. 6. Explicit Prompt: A close-up of a melting chocolate bar on a surface, with the edges losing their defined shape and pooling into a glossy, viscous puddle under the heat of the sun.
A solution of calcium carbonate reacting with acetic acid	1. Instruction Understanding: The core subject is a solution of calcium carbonate and acetic acid. The prompt specifies the reacting state of the chemical mixture. 2. Reasoning Process: The reaction produces carbon dioxide gas, which would be visible as a steady stream of bubbles rising through the liquid. 3. Establish the frame: The camera frames the solution closely to capture the details of the reaction. The composition centers on the liquid where the gas is visible. 4. Set the lighting and color: The liquid appears clear, allowing the white bubbles to stand out distinctly. The lighting is bright and even to illuminate the stream of gas. 5. Lock the style: The image maintains a realistic photographic style suitable for scientific observation. The focus is sharp on the reacting solution and bubbles. 6. Explicit Prompt: A test tube filled with a clear liquid and a rapid, effervescent stream of carbon dioxide bubbles rising to the surface, laboratory experiment.

🖼️ Text-to-Image (Infographics)

Reproducible prompts are in examples/t2i/data/samples_infographic.jsonl.

Image Editing

Side-by-side compare montages below show input(s) | output, with the edit instruction rendered along the bottom. The same unified model handles single-image attribute / style / relighting edits and multi-reference (subject + accessory + pose) composition.

✏️ Image Editing (General)

Reproducible prompts are in examples/editing/data/samples.jsonl.


_{Change the jacket of the person on the left to bright yellow.}	_{在小狗头上放一个花环，并且把图片变为吉卜力风格。}
_{Make the person in the image smile.}	_{Add a bouquet of flowers.}
_{Turn the image into an American comic style.}	_{Replace the man with a woman.}
_{Replace the text "WARFIGHTER" to "BATTLEFIELD" in the bold orange-red font.}	_{Remove the person on the far right wearing a green skirt and a green top.}

✏️ Image Editing (Reasoning)

Reproducible prompts are in examples/editing/data/samples_reasoning.jsonl.

Original Text	Original Image	Reasoning Process	Resulting Image
Draw what it will look like one hour later.		1. Source Image Analysis: The source image shows a glass cup of hot tea with steeping tea leaves, and the water appears relatively clear. 2. Instruction Understanding: The instruction asks for the appearance one hour later. 3. Reasoning Process: Over time, tannins and pigments leach out, making the tea noticeably darker and more uniformly colored, and the leaves may look more swollen and darker from soaking. 4. Expected Visual Changes: The expected visible result is a deeper amber-to-brown tea color and more fully saturated liquid. 5. Elements to Preserve: The glass cup, scattered leaves around it, background, and camera angle should remain unchanged. 6. Explicit Edit Prompt: Edit the tea liquid to a much darker, more saturated amber-brown color as if fully steeped, and make the tea leaves look slightly darker and more swollen, while keeping the glass cup, surrounding leaves, background, and framing unchanged.
Draw what it will look like immediately after someone stands up from sitting on it for a long time.		1. Source Image Analysis: The source image shows a fluffy lime-green beanbag chair that looks evenly plump and undisturbed on a white background. 2. Instruction Understanding: The edit instruction asks for its appearance immediately after someone stood up from sitting there for a long time. 3. Reasoning Process: Prolonged weight compresses the fabric and internal fill, leaving a depressed seat area, wrinkles radiating outward, and a slowly recovering shape. 4. Expected Visual Changes: The visible result should be a noticeable dip and creasing where a person was seated. 5. Elements to Preserve: The background, beanbag color, lighting, and camera angle should remain unchanged while only the beanbag’s shape shows the compression. 6. Explicit Edit Prompt: Edit the beanbag chair to show a clear seated depression in the center with surrounding wrinkles and slightly compressed fabric, while keeping the white background, lighting, and camera angle unchanged.
Draw an image showing the side view of the provided traffic cone.		1. Source Image Analysis: The source image shows a 3D perspective view of a traffic cone. 2. Instruction Understanding: The instruction asks for a side view. 3. Reasoning Process: A side view of a standard traffic cone results in a triangular silhouette with a flat rectangular base. 4. Expected Visual Changes: The perspective is flattened into this 2D-like geometric profile. 5. Elements to Preserve: The cone's height and color should remain consistent with the original object. 6. Explicit Edit Prompt: Edit the perspective view into a flat side-profile silhouette of a triangle with a rectangular base, keeping the red color and proportions unchanged.
Change the water to high-concentration saltwater		1. Source Image Analysis: The source image shows an egg resting at the bottom of a glass of water. 2. Instruction Understanding: The instruction asks to change the medium to high-concentration saltwater. 3. Reasoning Process: Saltwater is denser than fresh water, which increases the buoyant force on the egg. 4. Expected Visual Changes: As density increases, the egg will overcome gravity and float higher or suspend in the middle of the liquid. 5. Elements to Preserve: The glass and the egg's appearance should remain consistent, focusing on the shift in the egg's vertical position. 6. Explicit Edit Prompt: Edit the position of the egg so it is floating in the middle of the liquid instead of resting on the bottom, while keeping the glass and the egg's appearance unchanged.
What the fruit looks like when ripe in the picture		1. Source Image Analysis: The source image shows green, unripe bananas. 2. Instruction Understanding: The instruction asks for the appearance of the fruit when ripe. 3. Reasoning Process: Ripening involves a breakdown of chlorophyll and the production of sugars, which turns the skin from green to yellow and often causes small brown sugar spots to appear. 4. Expected Visual Changes: The color and texture of the peel should transition to a ripe state. 5. Elements to Preserve: The shape of the bananas and the white background should remain constant. 6. Explicit Edit Prompt: Edit the green bananas to be bright yellow with small brown spots, while keeping the original shape and white background unchanged.
Correct the unreasonable part in the image.		1. Source Image Analysis: The source image shows a kettle pouring water onto a mug, but the stream is misaligned and missing the cup. 2. Instruction Understanding: The instruction asks to fix the physical inconsistency. 3. Reasoning Process: The water stream must be redirected to connect the spout to the mug, maintaining the trajectory of gravity. 4. Expected Visual Changes: The water stream will be redirected to connect the spout to the mug. 5. Elements to Preserve: The kettle, mug, and background must remain unchanged while the water path is corrected. 6. Explicit Edit Prompt: Draw a continuous water stream connecting the kettle spout to the mug, keeping the kettle, mug, and background unchanged.
Modify the matrix in the image to an upper triangular matrix		1. Source Image Analysis: The source image shows a 2x2 matrix with values 1, 2, 3, and 4. 2. Instruction Understanding: The instruction asks to convert this to an upper triangular matrix. 3. Reasoning Process: By definition, an upper triangular matrix has zeros below the main diagonal, so the entry '3' must be changed to '0' while keeping '1', '2', and '4' as they are, and this modification satisfies the mathematical property requested. 4. Expected Visual Changes: The entry '3' in the lower-left position will be changed to '0'. 5. Elements to Preserve: The grid lines, the matrix structure, and the other entries must remain unchanged. 6. Explicit Edit Prompt: Change the '3' in the lower-left position to '0', while keeping the matrix structure and other entries unchanged.

Interleaved Generation

Each case below is a single rendered response from model.interleave_gen: the model first runs a <think>...</think> reasoning block that produces intermediate images, then emits the final interleaved text-and-image answer.

♻️ Interleaved Generation (General)

Reproducible prompts are in examples/interleave/data/samples.jsonl. All examples are generated with think-mode reasoning; the chain-of-thought is omitted in some cases for cleaner visualization.

♻️ Interleaved Generation (Reasoning)

Reproducible prompts are in examples/interleave/data/samples_reasoning.jsonl.

Visual Understanding

General visual understanding across spatial reasoning, multi-image comparison, OCR, geometry, and knowledge-intensive QA:

📝 Visual Understanding (General)

Reproducible prompts are in examples/vqa/data/samples.jsonl.

📝 Visual Understanding (Agentic)