{"id":196717,"date":"2025-08-07T06:18:00","date_gmt":"2025-08-07T05:18:00","guid":{"rendered":"https:\/\/liora.io\/en\/?p=196717"},"modified":"2026-02-06T07:43:00","modified_gmt":"2026-02-06T06:43:00","slug":"all-about-how-the-gpt-model-works","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/all-about-how-the-gpt-model-works","title":{"rendered":"How does the GPT model work?"},"content":{"rendered":"<p><b>Here we are not talking about <a href=\"https:\/\/liora.io\/en\/chatgpt-how-does-this-nlp-algorithm-work\">ChatGPT<\/a>, but that&#8217;s indeed where it got its name.<\/b><\/p>\n<p><b>If you\u2019ve never heard of GPT, you have at least used text-generating AIs, or &#8216;<a href=\"https:\/\/liora.io\/en\/large-language-models-llm-everything-you-need-to-know\">LLM<\/a>&#8216; (Large Language Model). GPT is a type of LLM, on which most of the generative AIs that everyone is talking about right now are based!<\/b><\/p>\n<p><i>Chat<\/i><b><i>GPT<\/i><\/b><i>, now you see why?<\/i><\/p>\n<p>GPT stands for \u2018<a href=\"https:\/\/liora.io\/en\/all-about-generated-pre-trained-transformers\">Generative Pre-trained Transformer<\/a>\u2019 (for the French variant), and it\u2019s a very clear name that perfectly summarizes its operation!<\/p>\n<p><i>How come you&#8217;re not convinced?<\/i><\/p>\n<p>Imagine GPT as a super-expert in word prediction, its main talent is guessing which word would most likely complete the beginning of a sentence. By repeating this prediction over and over, word by word, it builds entire sentences, paragraphs, and even complete articles!<\/p>\n<p><i>It helped a bit in writing this one, but there&#8217;s still a human behind it\u2026<\/i><\/p>\n<p>Let\u2019s detail together the different steps of operation to see how GPT learns to understand how human language works.<\/p>\n<p><a href=\"\/en\/courses\/data-ai\/\"><br \/>\nGoing deeper into GPT<br \/>\n<\/a><\/p>\n<style>\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style>\n<h2>Step 1: Transforming Words into Numbers<\/h2>\n<p>Computers don\u2019t understand words like we do. For them, \u201ccat\u201d or \u201chouse\u201d are just sequences of letters; for a machine to work with words, they need to be <b>transformed into numbers<\/b>.<\/p>\n<p>This is the role of Embeddings. Imagine that each word in the language (or almost) receives a unique secret code, which is a list of numbers.<\/p>\n<p><i>Think of a huge library. Each book (each word for GPT) gets a special label with a unique barcode (the <\/i><b><i>embedding vector<\/i><\/b><i>). This barcode isn\u2019t just an ID number; it contains hidden information about the book.<\/i><\/p>\n<p><i>For example, the barcodes of science fiction books might look similar, as do those of cookbooks, and if they are science fiction books that also talk about robots, their barcodes would be even closer!<\/i><\/p>\n<p>The more two words have a similar meaning, the \u201ccloser\u201d their list of numbers (their <b>embedding vector<\/b>) will be in an imaginary space. For example, the vectors for \u2018king\u2019 and \u2018queen\u2019 will be very close, and the difference between the \u2018king\u2019 vector and \u2018man\u2019 will be similar to the difference between \u2018queen\u2019 and \u2018woman\u2019! Pretty handy, right?<\/p>\n<p>The first thing GPT does when you give it text is to look at each word and find its corresponding list of numbers (embedding vector) in its \u201csecret code table.\u201d<\/p>\n<style>\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\n<p>\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1562\" height=\"702\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding.png 1562w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding-300x135.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding-1024x460.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding-768x345.png 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/KnowledgeGraphEmbedding-1536x690.png 1536w\" sizes=\"(max-width: 1562px) 100vw, 1562px\"><\/p>\n<h2>Step 2: The Order of Words (Positional Encoding)<\/h2>\n<p>We now have a list of numbers for each word. However, if we mix up the words, the sentence loses its meaning: \u201cThe cat sleeps\u201d doesn\u2019t mean the same as \u201cSleeps the cat.\u201d I tell you nothing new; word order is crucial!<\/p>\n<p>The issue is that our <b>GPT model<\/b> processes words in parallel (for obvious execution time reasons), thus without considering their order. To solve this, we add extra information to each word\u2019s number list: a <b>position marker<\/b>.<\/p>\n<p>These position markers (<b>Positional Encoding<\/b>) are also lists of numbers, calculated with special mathematical formulas.&nbsp;<\/p>\n<p><i>We won\u2019t go into details here, let\u2019s just note that it works really well!<\/i><\/p>\n<p>We simply add them to the word&#8217;s Embedding vectors (their own barcode).<\/p>\n<p>Now the model has a list of embedding vectors for each word, which contains information on our whole sentence; each element of this list contains both the information about the word itself and about where it is in the sentence.<\/p>\n<p><a href=\"\/en\/courses\/data-ai\/\"><br \/>\nLearn all about the GPT model<br \/>\n<\/a><\/p>\n<h2>Step 3: The Transformer<\/h2>\n<p><i>Nothing to do with Optimus Prime, though\u2026<\/i><\/p>\n<p>The smartest part of GPT is the Transformer architecture. Think of it as the brain that analyzes the list of numbers of the sentence to understand the context and predict the next word.<\/p>\n<p>GPT models use a simplified version of the original Transformer, called the <b>Decoder<\/b>. Why? Because their job is to <b>generate<\/b> text, and that\u2019s the role of the Decoder!<\/p>\n<p>This Transformer is built by stacking several identical &#8220;blocks&#8221; on top of each other. The more blocks there are (the more complex it becomes), the more powerful it is.<\/p>\n<p>Each block has several steps to process the embedding vectors of our words:<\/p>\n<h3>1. The Principle of Attention<\/h3>\n<p>This is the brilliant idea behind the Transformer. When you read a sentence, you don\u2019t give the same importance to all words to understand the meaning. For example, in \u201cThe Liora student who had studied well passed their exam,\u201d to understand \u201cpassed,\u201d you focus on \u201cstudent\u201d and \u201cexam.\u201d<\/p>\n<p><img decoding=\"async\" width=\"850\" height=\"765\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/The-Transformer-model-architecture.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/The-Transformer-model-architecture.png 850w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/The-Transformer-model-architecture-300x270.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/The-Transformer-model-architecture-768x691.png 768w\" sizes=\"(max-width: 850px) 100vw, 850px\"><\/p>\n<p>The <b>Attention<\/b> mechanism allows our model to do the same: for each word in the sentence, it looks at all previous words and decides which ones are the most important to understand the current word and predict the next!<\/p>\n<p>Often, the Transformer uses several \u201c<b>attention heads<\/b>\u201d in parallel. It\u2019s like having several people reading the sentence at the same time, each focusing on a different type of relation (one on grammar, one on meaning&#8230;), to then combine their analyses.<\/p>\n<h3>2. Reflection (Or Feed-Forward)<\/h3>\n<p>After the Attention mechanism allows each word to integrate the context of the previous words, each embedding vector independently passes through several layers of mathematical functions that rely on numbers, called <b>weights<\/b>. This ensemble is called a <a href=\"https:\/\/liora.io\/en\/all-about-artificial-neural-networks\">neural network<\/a>.<\/p>\n<p>This layer allows the model to perform more complex transformations on the information that <b>Attention<\/b> has extracted.<\/p>\n<p><a href=\"\/en\/courses\/data-ai\/\"><br \/>\nTraining for AI development<br \/>\n<\/a><\/p>\n<h2>Step 4: Rinse and Repeat!<\/h2>\n<p>The real power of GPT comes from the fact that it\u2019s not a single Transformer block but several (dozens, even hundreds!) stacked on top of each other.<\/p>\n<p>Imagine a multi-story factory, on each floor, our embedding vectors are processed by the <b>Attention and Reflection mechanisms<\/b>. The information that comes out of one floor then becomes the input for the floor below.<\/p>\n<p>The first layers learn to handle simple relationships between words.<\/p>\n<p>The intermediate layers combine this information to understand more complex relationships and the structure of sentences. The last layers understand the overall meaning, tone, and style. The information becomes richer as it goes down the floors!<\/p>\n<p><i>And more and more abstract and incomprehensible to us, poor humans.<\/i><\/p>\n<p><img decoding=\"async\" width=\"2560\" height=\"1707\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1.jpg\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1.jpg 2560w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1-300x200.jpg 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1-1024x683.jpg 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1-768x512.jpg 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1-1536x1024.jpg 1536w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/aerps-com-5e4Zlblkvks-unsplash-scaled-1-2048x1366.jpg 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\"><\/p>\n<h2>Step 5: Training<\/h2>\n<p>Imagine we give the machine billions of texts (books, articles, web pages&#8230;). We hide the next word in each sentence from it and say: \u201cGuess!\u201d<\/p>\n<p>This is <b>Training<\/b>.<\/p>\n<p>The model tries to predict the next word based on the previous words.<\/p>\n<p>At first, it makes lots of mistakes, but each time it does, we tell it: \u201cNo, the real word was this one,\u201d and the model then adjusts its internal parameters (<b>the weights!<\/b>) so that the next time it sees a similar situation, it has a better chance of guessing the right word.<\/p>\n<p>This error-based adjustment process is called Gradient Descent<b>.<\/b><\/p>\n<p>By playing this prediction game billions of times on billions of texts, the model learns not only which words often go together but also grammar, syntax, and even different writing styles!<\/p>\n<p><a href=\"\/en\/courses\/data-ai\/\"><br \/>\nMastering AI model training<br \/>\n<\/a><\/p>\n<h2>Final Step: Text Generation<\/h2>\n<p>Once the model is trained, it is ready to generate text, based on an initial prompt to predict associated words, <b>the prompt!<\/b><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The model takes your prompt, transforms it into lists of numbers, and sends them through all its Transformer blocks.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">It outputs a list of probabilities for <b>each possible word<\/b> in the vocabulary. For example, after \u2018<i>Once upon a time\u2026<\/i>,&#8217; the word \u2018<i>there was<\/i>&#8216; has an 80% chance, \u2018<i>once<\/i>&#8216; has 10%, \u2018<i>the<\/i>&#8216; has 5%, \u2018<i>in<\/i>&#8216; has 3%\u2026<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The model then chooses a word from this list of possibilities. It doesn\u2019t always pick the most probable to ensure the text isn\u2019t too repetitive. This word is added to the sequence: \u201c<i>Once upon a time there was<\/i>.\u201d<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The model takes this new sequence as input and repeats the process: it predicts the next word (\u201c<i>knight<\/i>\u201d? \u201c<i>cat<\/i>\u201d? \u201c<i>day<\/i>\u201d?), chooses a word, adds it to the sequence\u2026<\/li>\n<\/ol>\n<p>And it continues, word by word, until it generates a special word that means \u201cend of sentence\u201d or \u201cend of text,\u201d or until it reaches a maximum length!<\/p>\n<p><img decoding=\"async\" width=\"2560\" height=\"1922\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1.jpg\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1.jpg 2560w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1-300x225.jpg 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1-1024x769.jpg 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1-768x577.jpg 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1-1536x1153.jpg 1536w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2025\/06\/shantanu-kumar-xvdkNBaja90-unsplash-1-scaled-1-2048x1538.jpg 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\"><\/p>\n<h2>Conclusion: This Works Really Well!<\/h2>\n<p>The power of GPT comes from the combination of several elements:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <b>Transformer<\/b> architecture and the <b>Attention<\/b> mechanism, which allow it to understand context over very long sentences.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <b>layer stacking<\/b> that allows it to learn increasingly complex representations of language.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <b>massive training<\/b> on gigantic amounts of text, giving it a very broad knowledge of language and the world.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <b>word-by-word generation<\/b> process based on prediction, allowing it to create fluid and varied text.<\/li>\n<\/ul>\n<p>GPT doesn\u2019t \u201cthink\u201d: It is extremely skilled at identifying <b>complex statistical patterns in language<\/b> and uses them to predict the most likely continuation of a sequence of words.<\/p>\n<p>But the result of this prediction, thanks to the scale of the model and the data used for training, is often text that seems intelligent, relevant, and creative to us!<\/p>\n<p><a href=\"\/en\/courses\/data-ai\/\"><br \/>\nDiscover our AI training courses<br \/>\n<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here we are not talking about ChatGPT, but that&#8217;s indeed where it got its name. If you\u2019ve never heard of GPT, you have at least used text-generating AIs, or &#8216;LLM&#8216; (Large Language Model). GPT is a type of LLM, on which most of the generative AIs that everyone is talking about right now are based! [&hellip;]<\/p>\n","protected":false},"author":74,"featured_media":196719,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-196717","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/196717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/74"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=196717"}],"version-history":[{"count":5,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/196717\/revisions"}],"predecessor-version":[{"id":205527,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/196717\/revisions\/205527"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/196719"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=196717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=196717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}