Formats
The model supports 9 main format variants. They can be divided into 4 categories:
1. Detailed Structured Descriptions with Multiple Sections
- Long Thoughts
- Long Thoughts v2
- Chroma-style
These create the most detailed descriptions across various criteria with maximum informativeness. They can later be refactored into captions for training a specific model, or their parts can be used for labeling.
The first two are intended for use only with named characters.
2. Structured Formats for Ready-to-Use Prompts
- Minimalistic Structured Markdown
- Minimalistic Structured Json
- Json
These can be directly used as prompts after minimal processing. The existing structure allows selecting individual parts for individual prompting, or leaving them in their current form.
Json is recommended to be parsed into plain text or Markdown.
The Minimalistic Structured Markdown variant has two additional reasoning sections at the beginning; when using it, they should be excluded with regular expressions. It can only be used with character names.
3. Comic Descriptions
- Markdown Comic
- Json Comic
Both formats are optimized for comic and manga descriptions, creating detailed descriptions of each frame.
Markdown Comic has a reasoning block at the beginning to improve accuracy.
4. Legacy Formats
- Long
- Short
Long and short descriptions in plain text without formatting and structuring. Support for using names is available.
Examples
Long Thoughts v2
1. Thoughts about characters
The image features two characters: Shirakami Fubuki and Sukonbu (a personification of Shirakami Fubuki). Shirakami Fubuki is identifiable by her distinct fox ears, white hair with a single braid, fox tail, and ahoge. Sukonbu is a small, chibi-like figure with fox ears, a star accessory on the chest, and a simple, rounded design.
2. Key details
- The main focus is on the shadow cast on the sand, which shows a figure in a blue bikini with fox ears and a tail, making a double fox shadow puppet gesture with both hands.
- The shadow is clear and well-defined, contrasting sharply with the sandy background.
- The actual character (Fubuki) is not directly visible, only her shadows and a small plush-like Sukonbu figure in the lower left corner.
- The shadow hints at a playful and cute moment, referencing the double fox shadow puppet.
- The lighting suggests a sunny day with sharp shadows, possibly at sunset or sunrise due to the warm sand tone.
3. Long description
This high-resolution image depicts a sandy beach scene from a birds-eye view. The sand is smooth and uniformly colored in warm beige tones. Dominating the center is a sharp, dark shadow of a fox-eared girl over a blue bikini with a frilled skirt bottom. The shadow shows the character standing on one leg and arms raised, hands forming the classic fox shadow puppet gesture with two fingers extended on each hand. The shadow includes a clearly visible fox tail, fox ears, and an ahoge (a single hair strand sticking up).
In the bottom left corner, a small, chibi-style plush or figure resembling Sukonbu, the personification of Shirakami Fubuki, is sitting on the sand. Sukonbu has large black eyes, a small open mouth with a cheerful expression, blue markings on the face, and a star-shaped accessory on its chest. The plushs ears and tail are clearly visible, matching the fox theme.
The image cleverly uses the shadow as a focal point, showing the silhouette of Shirakami Fubuki in her blue bikini, while the actual figure is not seen directly, creating a playful and meme-inspired visual. The lighting is bright and casts a crisp shadow on the sand, emphasizing the contours and playful pose of the fox-girl shadow puppet.
4. Detailed description for each character
Shirakami Fubuki
Though not directly visible, Shirakami Fubukis shadow is the central figure on the sand. The shadow shows her wearing a blue bikini top tied in the center with a bow and a frilled bikini skirt bottom. She has fox ears on top of her head and a prominent fox tail curling behind her. Her arms are raised, and she is making a double fox shadow puppet with her hands. The shadow also shows her signature ahoge on top of her head. Her posture is relaxed, standing with legs apart, and the shadow captures her playful and cute personality.
Sukonbu (Shirakami Fubuki)
Sukonbu appears as a small, chibi plush or figure in the bottom left corner of the image. It has a round, white face with large black oval eyes, a small smiling mouth, and light blue markings on the cheeks and forehead. The plush has large fox ears with black inner parts and a yellow tuft of hair at the back. It wears a blue scarf or ribbon tied around its neck with a star-shaped accessory in the center. The tail is fluffy and visible behind it, matching the fox theme. Sukonbu looks cheerful and adds a cute contrast to the shadow figure.
Long Thoughts
1. Thoughts about characters
The main identifiable character here is Wakamo from Blue Archive, recognized by her distinctive fox ears, large fluffy tail, and long black hair with yellow eyes. The other figure is a roughly sketched man in a suit, likely representing Sensei (Blue Archive), though he is not fully detailed or named.
2. General description
The image is a humorous, stylized cartoon drawing with a simple white background. It features a chibi-styled Wakamo sitting on a small wooden stool, holding a bouquet of yellow flowers. She has large fox ears, a fluffy black and red tail wagging excitedly, and wears a black dress with gold patterns and a red cape. Next to her stands a crudely drawn man in a beige suit, tightly bound in red string, holding a "Marriage Certificate" paper with a red stamp and a checkmark. The mans face is obscured by a scribble, and he smiles awkwardly. In the upper right corner, a small inset photo shows a real-life wedding scene with the same couple tied up in a similar red string bondage, smiling and posing happily.
3. Detailed description for each character
Wakamo (Blue Archive)
Wakamo is portrayed in a chibi style with exaggerated large yellow eyes and a small open mouth showing a cheerful expression. Her long black hair flows down her back, adorned with a small yellow flower hair ornament on one side. Her fox ears are black with red tips and white inner fur, positioned upright. She wears a black dress with gold cloud-like patterns and a red cape draped over her shoulders. Her large, fluffy fox tail is black with a red tip and is animatedly wagging, indicated by motion lines. She holds a small bouquet of yellow flowers in her gloved hands and sits on a simple wooden stool.
Sensei (Blue Archive) (sketch)
The man next to Wakamo is roughly sketched with minimal detail. He wears a beige suit with a white shirt and a tie, and a small yellow flower is tucked into his suit pocket. His face is obscured by a scribble, but he has a smiling mouth visible. He is tightly bound in red string crossing his torso and arms, holding a paper labeled "Marriage Certificate" with a red stamp and a checkmark. His posture is slightly bent forward, and the strings appear taut, emphasizing the bondage.
4. Individual Parts
- Wakamos large fox ears: black with red tips and white inner fur, upright on her head.
- Wakamos long black hair: flowing behind her with a yellow flower ornament on the left side.
- Wakamos yellow eyes: large and round with a happy expression.
- Wakamos black dress: decorated with gold cloud patterns, short and fitted.
- Wakamos red cape: draped over her shoulders, flowing behind her.
- Wakamos fluffy fox tail: black with a red tip, wagging energetically with motion lines.
- Wakamos small bouquet: yellow flowers held in both hands.
- Wakamos small wooden stool: simple design, supporting her seated position.
- The mans beige suit: classic cut with a white shirt and tie underneath.
- Yellow flower in the mans suit pocket: matching the bouquet Wakamo holds.
- Mans face: scribbled out, obscuring identity, but smiling mouth is visible.
- Red string bondage: tightly wrapped around the mans torso, arms, and hands.
- Marriage Certificate paper: held by the man, with red stamp and a checkmark.
- Motion lines around Wakamos tail: indicating wagging movement.
- White background: clean and simple, focusing attention on characters.
- Small inset photo in upper right corner: real-life photo of the same couple tied up in red string, smiling, with wedding attire and bouquet.
5. Texts on image
- On the paper held by the man, the text reads: "Marriage Certificate" in handwritten style.
- The paper also features a red stamp and a checkmark symbol.
6. Background and effects
The background is plain white with no additional elements, emphasizing the characters. The main drawing style is cartoonish with simple lines and flat colors. The inset photo in the upper right corner is a real-life photograph showing the same couple tied up in red string bondage, smiling and posing happily in wedding attire with a bouquet. The contrast between the stylized cartoon and the real photo adds a humorous meta-commentary. The image is clear and focused, with no blur or depth of field effects. The tails wagging motion is conveyed through curved motion lines.
Chroma-style
1. Regular Summary:
This piece is a stunning digital painting, likely done by a professional artist judging by the detail and dynamic composition. The perspective is a dramatic dutch angle, looking up at Rosmontis from Arknights, a catgirl with long grey hair and green eyes, who's wielding a massive, futuristic-looking weapon that looks like a jetpack or some kind of energy cannon. She's dressed in a short, sleeveless white dress with a black jacket draped over her shoulders, black gloves, and black boots. Her cat ears and tail are clearly visible. The background is a chaotic scene of crumbling buildings and debris, suggesting a post-apocalyptic or battle-ravaged city. The color palette is mostly dark greys and blacks, contrasted by the bright blues and reds of the energy effects and weapon. There's a sense of intense action and power; she looks ready to wreck some shit. The overall aesthetic is high fantasy meets sci-fi, with a lot of detail in the character design and the weapon's design. The artist clearly has skill and the work is very high quality.
2. Individual Parts:
- Rosmontis (Arknights): The central figure, a catgirl wielding a large weapon. She's positioned slightly off-center, facing towards the viewer.
- Large Weapon: A futuristic-looking energy cannon or jetpack-like device, held by Rosmontis. It's the most prominent object in the image, with glowing blue and red energy effects.
- White Dress: A short, sleeveless white dress worn by Rosmontis.
- Black Jacket: A black jacket draped over Rosmontis's shoulders.
- Black Gloves: Gloves worn by Rosmontis.
- Black Boots: Boots worn by Rosmontis.
- Cat Ears: Cat ears atop Rosmontis's head.
- Cat Tail: A cat tail extending from Rosmontis's back.
- Crumbling Buildings: The background consists of partially destroyed buildings, suggesting a post-apocalyptic setting.
- Debris: Scattered debris and rubble throughout the background.
- Energy Effects: Bright blue and red energy effects surrounding the weapon and Rosmontis, adding to the dynamic feel.
- Ground: The ground is cracked and broken, consistent with the background's overall state of destruction.
- Lighting: The lighting is dramatic, highlighting Rosmontis and the weapon against the darker background.
- Perspective: The image is shot from a low angle, looking up at Rosmontis, creating a sense of power and scale.
- Color Palette: The color palette is dark, with contrasting bright blues and reds.
3. Midjourney-Style Summary:
Rosmontis, Arknights character, catgirl, wielding massive energy weapon, futuristic design, glowing blue and red energy, short white dress, black jacket, black gloves and boots, cat ears and tail visible, dramatic dutch angle, crumbling city background, post-apocalyptic setting, intense action, high fantasy/sci-fi aesthetic, high-quality artwork, dynamic composition, dark color palette, bright color accents, sense of power and scale, detailed character design, detailed weapon design, professional-level art.
4. DeviantArt Commission Request
A dynamic action piece featuring Rosmontis from Arknights. She's a catgirl with long grey hair and green eyes, wearing a short, sleeveless white dress and a black jacket. She's wielding a huge, futuristic energy weapon that resembles a jetpack or energy cannon, with glowing blue and red energy effects. The background should be a chaotic scene of a ruined city, with crumbling buildings and debris. The overall style should be high fantasy meets sci-fi, with a focus on dynamic action and detail. The perspective should be a dramatic dutch angle, looking up at Rosmontis from a low angle. The lighting should be dramatic, highlighting Rosmontis and the weapon. The color palette should be mostly dark greys and blacks, with bright blue and red accents. High resolution, please.
Minimalistic Structured Markdown
1. Thoughts about characters
The girl with long white hair tied in a ponytail, pointy elf ears, and wearing a white and gold-trimmed outfit is clearly Frieren. The boy with short blue hair, blue eyes, and a blue cloak is Himmel from Sousou no Frieren.
2. Key details
The two characters are sharing a gentle kiss, eyes closed, standing close to each other in front of a rustic stone and timber house with a red roof. The outdoor setting is bright with a blue sky, scattered clouds, and mountains in the background. The mood is romantic and serene. Petals float softly around them, adding a delicate, dreamy atmosphere.
3. Structured description
General
The image depicts a romantic moment outdoors, with two characters kissing in front of a traditional stone and wood cottage with a red-tiled roof. The scene is bathed in warm natural light, with a mountain range and scattered clouds visible in the background. The grassy field and stone wall in the foreground add depth and frame the couple.
Frieren
Frieren stands on the left, eyes closed, leaning in for the kiss. She has long white hair pulled into a ponytail, pointed elf ears, and wears a white dress with gold trim and a matching capelet. Her expression is peaceful and tender.
Himmel (Sousou no Frieren)
Himmel stands on the right, also with eyes closed, reciprocating the kiss. He has short blue hair, a blue cloak with a high collar, and a black undershirt visible beneath. His posture is gentle and affectionate.
Image effects
The image has a soft, slightly grainy texture reminiscent of traditional animation or painted art. The lighting is warm and natural, casting gentle shadows and highlights that emphasize the characters and the background scenery. Petals floating in the air add a subtle dynamic element.
Minimalistic Structured Json
{
"General": "Three girls from Nijisanji pose closely together against a plain white background, each making a distinct hand sign. The image is brightly lit and sharply detailed, focusing on their upper bodies and expressive faces.",
"Ange Katrina": "On the left, Ange has short, layered red hair with a small gold triangular hairclip. Her blue eyes and slightly parted lips show a gentle, curious expression with a faint blush. She wears a red jacket with wide white cuffs over a high-collared white shirt adorned with a large blue bow and gold triangular decorations. Her right hand forms a peace sign near her face.",
"Lize Helesta": "Centered, Lize has long white hair with blue streaks and blunt bangs, accented by a blue feather hair ornament on the left side. Her purple eyes and confident, slightly smug smile with a blush accompany her raised hands making a double 'rock on' gesture. She wears a white blouse with a high collar, a large blue bow, and frilled blue cuffs under a sleeveless dark vest.",
"Inui Toko": "On the right, Inui sports long dark brown hair styled in low twintails with black dog ears topped by a cream maid headdress. Her heterochromia features one yellow and one red eye, and she has a small red flower hairpin. She wears a black kimono with wide sleeves, a white frilled collar, and a blue obi tied at the back. A small black-and-white dog plush or accessory peeks behind her shoulder. She makes a peace sign with her left hand, her mouth open slightly revealing a small fang, and her cheeks flushed."
}
Json
{
"character_1": "Hoshimi Miyabi is depicted sleeping upright on a couch, her head resting on the shoulder of another girl. She has long black hair with fox ears, and her attire consists of a white shirt, a black necktie, and a dark grey pleated skirt. Her expression is peaceful, her eyes closed. She's wearing black thigh-high stockings with a lace trim at the top.",
"character_2": "Belle is sitting on the couch, holding a smartphone in her hands. She has short, blue hair and green eyes. She's wearing a dark grey long-sleeved shirt with Japanese text on it, and a black skirt. Her expression is calm and she's looking at the phone. She's wearing black thigh-high stockings.",
"background": "The setting appears to be an indoor space, possibly a living room or common area. The background includes a brick wall, a chalkboard, a small potted cactus on a table, and various other indistinct objects. The couch is orange and has cushions. The overall lighting suggests an evening or nighttime setting.",
"image_effects": "The image has a soft, slightly muted color palette, giving it a calm and relaxed atmosphere. The style is consistent with modern anime artwork.",
"texts": "None",
"atmosphere": "The overall atmosphere is intimate and peaceful. The scene depicts a quiet moment of companionship between the two girls, with a sense of calm and relaxation. The soft lighting and muted colors contribute to this feeling."
}
Markdown Comic
1. Thoughts about characters
The three girls depicted are clearly identifiable based on the given tags and their visual traits. The girl with brown hair, blue eyes, long hair, rabbit ears, and wearing a black jacket with a blue scarf is Amiya (Arknights). The girl with short pinkish hair, blue eyes, fox ears, and a white jacket with a red cross is Sussurro (Arknights). The blonde-haired girl with yellow eyes, fox ears, a blue hairband, and a white and black outfit is Suzuran (Arknights).
2. Key details
- The comic humorously contrasts the characters' appearances at two different ages: "In 1097 years" and "In 1102 years."
- The first two frames show the characters standing side by side with neutral expressions.
- The third frame zooms in on Sussurro’s face with a serious, slightly annoyed expression against a black background, emphasizing her mood.
- The final frame shows Sussurro drinking milk directly from a carton, with a speech bubble saying "too late."
- A mysterious hooded figure with a milk carton stands in the background of the last frame, adding to the scene’s humor.
3. Comic format
The comic is a 4koma (four-panel comic) arranged in a 2x2 grid. The characters Amiya, Sussurro, and Suzuran (all Arknights operators) appear in all frames except the third, which is a close-up of Sussurro alone.
4. Details for each frame
4.1 Frame 1 (top-left)
Three girls stand side by side against a white background with the text "In 1097 years" above them. From left to right: Amiya, Sussurro, and Suzuran. Amiya has long brown hair, rabbit ears, a black jacket with blue highlights, and a blue scarf. Sussurro has short pinkish hair, fox ears, and wears a white jacket with a red cross on the sleeve and a black choker. Suzuran has long blonde hair, fox ears, a blue hairband, and a white and black dress with a skirt. All three have neutral, slightly serious expressions.
4.2 Frame 2 (top-right)
The same three characters appear again, but the text above reads "In 1102 years." They look slightly older and more mature. Amiya’s hair is longer, and she wears a sleeveless white top with a blue skirt and her jacket hanging off her shoulders. Sussurro’s expression is unchanged, still serious. Suzuran looks more mature with a frilled collar and a more elaborate outfit, standing with her hands on her hips. The background remains white.
4.3 Frame 3 (bottom-left)
A close-up of Sussurro’s face fills the frame against a black background. She has a slightly annoyed, sweat-dropping expression with narrowed blue eyes. Her fox ears are prominent, and her pinkish hair with an ahoge (hair antenna) curls upward. She wears her white jacket with the red cross visible on the sleeve.
4.4 Frame 4 (bottom-right)
Sussurro stands drinking milk directly from a carton, tilting her head back with closed eyes and a satisfied expression. She wears a sleeveless blue dress with a black collar and a black choker. Her fox ears and tail are visible. Next to her is a hooded figure (possibly a doctor or nurse) standing silently with a milk carton in hand. On the table in front of them is a bowl of cereal or some dry food and a small white container. Sussurro’s speech bubble says "too late."
5. Extra comment
The comic uses a simple and clean art style with clear linework and soft colors. The humor revolves around the passage of time and Sussurro’s stubborn or impatient attitude, culminating in her drinking milk in a somewhat casual, relaxed manner despite the serious tone of the previous frames. The mysterious hooded figure adds an extra layer of comedic mystery to the final panel.
Json Comic
{
"comic_format": "Comic of 2 frames",
"1st_frame": "A man with brown hair and stubble is shown in a state of shock. He is wearing a jumpsuit with a logo on the chest and holding a blue stuffed toy. The door he is standing next to is broken, and there is a pregnancy test visible in the foreground. There is a poster on the wall with a character and the text 'Ha Ha Yee'.",
"2nd_frame": "A girl with black hair and black eyes is peeking through the broken door. She is smiling and looking at the man with a playful expression. She has a striped shirt visible under her jacket. The text 'Here's Mommy!' is displayed below her.",
"character_1": "The man appears surprised and anxious, with facial hair and wearing a jumpsuit. He is holding a stuffed toy.",
"character_2": "The girl has a mischievous smile, black hair, and is wearing a striped shirt under a jacket. She seems to be the source of the man's surprise.",
"texts": "The text 'Here's Mommy!' is present in the second frame.",
"meaning": "The comic parodies a scene from 'The Shining,' with a humorous twist involving a pregnancy test and the unexpected appearance of the girl."
}
Long
The image features two characters from "Boku no Hero Academia," Todoroki Shouto and Bakugou Katsuki, standing back-to-back against a vibrant green background.
Todoroki Shouto is on the left. He has striking multicolored hair, split between white and red, and heterochromia with one blue eye and one grey. His expression is calm, with a closed mouth and a slight smile. He wears a stylish letterman jacket with a floral print, showcasing intricate red and orange flowers. The jacket's sleeves are long, and he has a relaxed posture, with his head slightly tilted.
On the right is Bakugou Katsuki, identifiable by his spiky blonde hair and intense red eyes. His expression is more animated, with an open mouth revealing sharp teeth, conveying a sense of determination or frustration. He is also wearing a letterman jacket, but with a different floral design featuring subtle white and grey flowers. His body language is assertive, leaning slightly forward.
The background is a simple, solid green that makes the characters stand out prominently. The text "Todoroki & Bakugou" is boldly displayed in white, adding a dynamic element to the composition. The overall atmosphere is energetic and vibrant, capturing the contrasting personalities of the two characters.
Short
The image features a chibi-style girl, Artoria Pendragon (Fate), with short, pale blonde hair in a bob, small braids, and striking yellow eyes. Her expression is serious, with a slight blush on her cheeks. She wears a dark purplish-grey dress with a low-cut neckline, revealing cleavage, and long sleeves with white detailing. A dark teal pleated skirt is visible underneath, along with black pantyhose and shoes. She holds a black sword with red markings in her right hand, ready for action. The background is a dark bluish-grey with dynamic red streaks, suggesting motion. The overall atmosphere is intense and dramatic, enhanced by the chibi art style. The artist's signature 'Yui2' is in the upper right corner.
Images belong to their authors and are used exclusively as examples.









