Also known as Autoregressive models.
Generative AI models, particularly those like GPT3 and GPT4, are often referred to as “decoder-only” models due to their architecture and operational principles. Here’s a breakdown of why:
Transformer Architecture: These models are based on the transformer architecture, which fundamentally includes encoder and decoder components. The encoder processes the input data, while the decoder generates the output.
Focus on the Decoder: In a decoder-only model, the focus is solely on the decoder part of the transformer architecture. The encoder component is not utilized in these models. This is in contrast to models like BERT (Bidirectional Encoder Representations from Transformers), which use only the encoder part, or models like T5 (Text-To-Text Transfer Transformer), which use both encoder and decoder components.
Generative Task Orientation: The decoder is primarily responsible for generating data. In the context of language models like GPT3 or GPT4, the decoder generates textual output. It takes an input (i.e. prompt) and then generates a continuation of that input. This is a generative task, as opposed to discriminative tasks where the goal is to categorize or differentiate between different types of input.
Autoregressive Nature: Decoder-only models are often autoregressive. This means they generate one part of the sequence at a time and use what they have generated so far as context for generating the next part. For example, when generating text, the model will generate one word at a time, each time considering all the previous words it has generated to inform the next word.
Absence of Explicit Input Encoding: Since these models focus on generation, they don’t need a separate mechanism to encode input. The input prompt is directly fed into the model, which then starts generating the output based on this input. This contrasts with encoder-decoder models where the input first goes through an encoding process before being used for generating output.
See Also: Encoder-only models, Encoder-Decoder models