The GPT writes a plain-English visual description of the banner: what’s happening in the scene, where the product is placed, what the model is doing, what the viewer notices first / second / third. No camera jargon, no design terminology — described as if to a non-designer. This description is the bridge to image generation. Iterate until it matches what you’re envisioning.