As Text-Encoder Does This Z-Engineer-V4 Provide Better Quality Than Original Qwen3-4B

#12
by willswordpath - opened

Z-Engineer-V4 can greatly enhance user prompt. But as a text-encoder, does it encode better text-embeds to generate better image than the original Qwen3-4B? Are its parameters related to text-encoding changed/modified/fine-tuned than the original Qwen3-4B?
If the text-encoding parameters do is changed during your heretic/fine-tune process, then the encoded text-embeds would be differ than those encoded by original Qwen3-4B, maybe this suggests using the original Qwen3-4B as the text-encoder, because Z-Image/Z-Image-Turbo is trained against the text-embeds generated by original Qwen3-4B, using the original would yield original quality (if Z-Engineer-V4 isn't specifically fine-tuned against Z-Image/Z-Image-Turbo as their text-encoder, then using it as text-encoder would degrade generation quality).

if Z-Engineer-V4 isn't specifically fine-tuned against Z-Image/Z-Image-Turbo as their text-encoder, then using it as text-encoder would degrade generation quality.

this is incorrect. It's sadly a well repeated misunderstanding. I need to document exactly why it's wrong, but it's wrong. I'll reply with where ever I end up putting my argument/logic/explanation as I need to do it once and stop trying to rewrite it each time I end up having to correct this.

Sign up or log in to comment