Self-Attention

Self-attention is a specific form of the attention mechanism in which the query, key, and value vectors are all derived from the same input sequence. This allows every position in a sequence to attend to every other position, making the transformer architecture possible. Self-attention is the operation that enables large language models to capture long-range dependencies without sequential processing.