A generative image model takes an image as input and can output text, another image, or video. For example, under the output text, you can get visual question-answering while under output:image, an image completion is generated. And under output: video, animation is generated. A generative language model takes text as input and can output more text, an image, audio, or decisions. For example, under the output text, question answering is generated. And under:output image, a video is generated.
(Source: https://www.cloudskillsboost.google/)