Recent advancements in artificial intelligence have propelled the field of text-to-image generation to unprecedented heights. Deep generative models, particularly those employing binary representations, have emerged as a promising approach for synthesizing visually realistic images from textual prompts. These models leverage sophisticated architect