There are two parts to your visuals: the spoken frames (talkies) and the action shots (B-roll). You've already generated the B-roll. For talkies, use the 4 prompts to generate 4 different types of talkie frame, so each spoken line in your video looks fresh and varied.