← Back to modules
πŸ”¦

Module 3

Attention

When you read a sentence, your brain automatically focuses on the most relevant words to understand meaning. AI does the same thing β€” using a mechanism called attention to decide which words matter most in context.

Select your year level:

What is the attention mechanism?

Consider the sentence: β€œI sat by the river bank.” Does β€œbank” mean a financial institution, or the side of a river? To know, the AI must look at the other words β€” especially β€œriver” β€” and decide which ones matter most.

This is exactly what attention does. It lets each word in a sentence look at all the other words and assign a weight β€” a score representing how relevant each other word is to understanding its meaning in this particular context.

Queries, Keys, and Values

Under the hood, each word is transformed into three vectors:

The attention score between two words is calculated by comparing the Query of one word against the Keys of all others. Higher scores mean more influence.

Explore attention patterns

πŸ” Attention Visualiser

Yr 7–8

Click any word to see what it pays attention to.

☝️ Click a word above to see its attention pattern

Word types in this sentence:

DETADJNOUNVERBPREP

πŸ”¦ Did you know? GPT-4 uses 96 attention heads across 96 transformer layers. Each head independently learns to notice different linguistic patterns β€” some track pronouns, some track verbs and their objects, some detect sentiment. Together they form a rich, multi-dimensional understanding of context.

What you've learned