Will prompt adherence include seperate character tagging

#120

by Nognig - opened 10 days ago

I have been testing prompts with characters with different attributes/ characteristics but most of the time it will generate the stereotypes on the wrong character, for instance if you put "1boy with long pink hair, 1girlwith short black hair" the boy will get the short black hair and vice versa. FLUX Klein seems to be able to split the two apart but SDXL will not.
When I add quotes around it seems to help but if you add more than basic prompts it will go back to stereotypes.

Here is the prompt that works:
masterpiece, best quality, score_7, safe,

"1girl with black short hair, small_breasts, blue_dress,"

"1boy with long pink hair, blue shirt, black pants,"
(https://cdn-uploads.huggingface.co/production/uploads/6331731e5102481d8dcf0451/cFJ-cgsjt47ngW4Vf7Eg9.png)

But when you start to add details it will start to mess up the characters attributes.

My hope is that this will be trained further so we can do things like specify characteristics for separate people , position on the canvas (girl on the left), etc...
I think that will really help it be a big improvement over illustrious and noobai models that have nothing close to this level of prompt adherence for anime models.

I really hope you think over my suggestion as I think it will be a huge step forward for anime models. if you need any clarification on my suggestion just feel free to ask I know I am not the best writer.
If anyone has anything to discuss on this topic that would be great.
Thank you.

Cidokage

9 days ago

Yeah, this is a problem similar to this thread https://huggingface.co/circlestone-labs/Anima/discussions/93

I tried myself with both the recommended advices there and official doc, but couldn't make it work no matter what, for exemple:
"masterpiece, best quality, highres, absurdres, anime screenshot, official art, score_7,
beach, upper body, straight-on,
3girls, the image depicts 3 girls at the beach.
left side of the image is cinderella (nikke), blue shirt, green jeans, white hair, red eyes, hair over one eye, smiling, holding microphone, microphone.
middle side of the image is yixuan (zenless zone zero), school uniform, pink skirt, white shirt, white hair, hair ornament, yellow eyes, angry, holding bottle, water bottle.
right side of the image is eri (blue archive), yellow tank top, red short, yellow eyes, white hair, side braid, confused, holding gun, witch hat."
give me this kind of results (I tried both with danbooru tags, NL, danbooru + NL tags)

Once I try with characters around 130 danbooru pics I can't even make them appear anymore even with all their tags when I do 3+ characters

synta

8 days ago

•

edited 8 days ago

Yeah, this is a problem similar to this thread https://huggingface.co/circlestone-labs/Anima/discussions/93

I tried myself with both the recommended advices there and official doc, but couldn't make it work no matter what, for exemple:
"masterpiece, best quality, highres, absurdres, anime screenshot, official art, score_7,
beach, upper body, straight-on,
3girls, the image depicts 3 girls at the beach.
left side of the image is cinderella (nikke), blue shirt, green jeans, white hair, red eyes, hair over one eye, smiling, holding microphone, microphone.
middle side of the image is yixuan (zenless zone zero), school uniform, pink skirt, white shirt, white hair, hair ornament, yellow eyes, angry, holding bottle, water bottle.
right side of the image is eri (blue archive), yellow tank top, red short, yellow eyes, white hair, side braid, confused, holding gun, witch hat."
give me this kind of results (I tried both with danbooru tags, NL, danbooru + NL tags)

Once I try with characters around 130 danbooru pics I can't even make them appear anymore even with all their tags when I do 3+ characters

Some testings that were conducted have shown that linebreaks ruin prompt adherence with model even forgetting who Hatsune Miku is. Newlines are fine in a pure natlang section but whenever tags are invovled it ruins adherence. Good luck, maybe that helps.

Cidokage

8 days ago

Thanks, it did help, it's still bleeding a lot but that's already somewhat better (though for whatever reason he doesn't want to make Eri appear)

"newest, recent, masterpiece, best quality, highres, absurdres, score_7, very aesthetic, official art, wide shot, dutch angle, beach, dynamic pose, sunset, 3girls, the image depicts 3 girls, cinderella \(nikke\), red eyes, hair over one eye, white hair, twintails, holding bottle, water bottle, blue shirt, green jeans, left side of the image is cinderella, she is wearing a blue shirt and a green jeans, she is holding a water bottle in her right hand, yixuan \(zenless zone zero\), black hair ornament, yellow eyes, white hair, long hair, watermelon, holding watermelon, middle side of the image is yixuan, she is wearing a school uniform with a white shirt and a pink skirt, she is holding a watermelon, eri \(blue archive\), white hair, single braid, yellow eyes, revolver, holding gun, right side of the image is eri, she is wearing a yellow tank top with a red shorts, she is holding a black revolver, "

I did some tests with 4 characters, and only added at the end:
"carlotta (wuthering waves), grey eyes, white hair, long hair, farthest right side of the image is carlotta, she is behind eri, she is wearing a black cheerleader outfit and holding a white microphone, she is dancing," while changing 3girls to 4girls,

And did another test with with a prompt passed through an LLM to fit qwen 3

"newest, recent, masterpiece, best quality, highres, absurdres, score_7, very aesthetic, official art, wide shot, dutch angle, beach, dynamic pose, sunset, 4girls, on the far left is cinderella from nikke with red eyes, hair covering one eye, twintails, white hair, wearing a blue shirt and green jeans, holding a water bottle. in the middle-left is yixuan from zenless zone zero, with yellow eyes, a black hair ornament, wearing a white school shirt and pink skirt, holding a watermelon. on the middle-right is eri from blue archive, with white hair in a single braid, yellow eyes, wearing a yellow tank top and red shorts, holding a black revolver. on the far right is carlotta from wuthering waves, with grey eyes, white hair, long hair, wearing a black cheerleader outfit, holding a white microphone, and dancing."

I guess it's better but still far away from usable. I tried with their danbooru tags instead of "from", but the results are pretty much the same mess. at this point I don't know if it's me being really bad at prompting (even when the prompt passed through an LLM), or if it's the model's limitations, or both

synta

7 days ago

•

edited 7 days ago

At the end of the day this is not Nano Banana. Having 4 characters with each a defined place, alternative dresscode, holding an object, while also look very similar one to another - you are asking too much in my humble opinion if you want it not to bleed. That's the kind of stuff you wanne do with regional prompting, if it all.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment