{"id":180678,"date":"2025-08-26T13:06:31","date_gmt":"2025-08-26T12:06:31","guid":{"rendered":"https:\/\/liora.io\/en\/?p=180678"},"modified":"2026-02-12T10:18:13","modified_gmt":"2026-02-12T09:18:13","slug":"jensen-shannon-divergence-everything-you-need-to-know-about-this-ml-model","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/jensen-shannon-divergence-everything-you-need-to-know-about-this-ml-model","title":{"rendered":"Jensen-Shannon divergence: Everything you need to know about this ML model"},"content":{"rendered":"\n<p><strong>The Jensen-Shannon divergence is used to measure the similarity between two probability distributions, particularly in the field of Machine Learning. Find out everything you need to know about this measure, from its history to its modern applications!<\/strong><\/p>\n\n\n\n<p>During the 20th century, the<strong> Danish mathematician Johan Jensen and the American statistician Peter Shannon<\/strong> made major contributions to <a href=\"https:\/\/liora.io\/en\/understanding-kurtosis-calculating-outlier-frequency-in-statistical-distributions\">information theory and statistics.<\/a> Born in 1859, Johan Jensen devoted much of his career to the study of convex functions and inequalities. In 1906 he published an article entitled &#8220;On convex functions and inequalities between mean values&#8221;. In it, he introduced the notion of convexity, and established several results on Jensen&#8217;s inequalities that bear his name. These are mathematical results describing the properties of convex functions.<\/p>\n\n\n\n<p><strong>Peter Shannon<\/strong> was born in 1917 and for a long time studied measures of divergence between probability distributions. In particular, he worked on problems related to density estimation. In the 1940s, several decades after <strong>Jensen&#8217;s work,<\/strong> the American developed methods for estimating the divergence between probability distributions. It was based on the Kullback-Leibler divergence: a measure invented in the 1950s by Solomon Kulback and Richard Leibler, widely used to quantify the <strong>difference between two distributions.<\/strong> It measures the dissimilarity between two probability distributions based on the logarithms of the probability ratios.<\/p>\n\n\n\n<p>Later, in the 1990s, researchers began to explore possible extensions and variations of the <a href=\"https:\/\/liora.io\/en\/navigating-data-drift-strategies-to-tackle-shifting-data-landscapes\">Kullback-Leibler divergence.<\/a> Their aim was to take better account of symmetry and dissimilarity between distributions. They drew on the pioneering work of Johan Jensen and Peter Shannon to create the <strong>Jensen-Shannon divergence.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-the-jensen-shannon-divergence\">What is the Jensen-Shannon divergence?<\/h2>\n\n\n\n<p>Jensen-Shannon divergence was first introduced in an article by Barry E. S. Lindgren in 1991, entitled <strong>&#8220;Some Properties of Jensen-Shannon Divergence and Mutual Information&#8221;.<\/strong> He developed this metric as a measure of symmetric divergence between two probability distributions. Its main difference from the Kullback-Leibler divergence on which it is based is its symmetry.<\/p>\n\n\n\n<p>It takes the <strong>weighted average of two KL divergences.<\/strong> One is calculated from the first distribution and the other from the second. The<strong> Jensen-Shannon divergence<\/strong> can therefore be defined as the weighted average of the Kullback-Leibler divergences between each distribution and an average distribution.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center wp-container-core-buttons-is-layout-a89b3969\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/liora.io\/en\/courses\/data-ai\/machine-learning-engineer\">Learn more about JS divergence<\/a><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-do-you-calculate-the-jensen-shannon-divergence\">How do you calculate the Jensen-Shannon divergence?<\/h2>\n\n\n\n<p>The first step in <strong>calculating the Jensen-Shannon divergence<\/strong> is to pre-process the data to obtain the probability distributions P and Q. The probability distributions can then be estimated from the data. For example, it is possible to count the occurrences of each element in the sample.<\/p>\n\n\n\n<p>When the distributions are available, the divergence can be calculated using the formula :<\/p>\n\n\n\n<p><strong>JS(P || Q) = (KL(P || M) + KL(Q || M)) \/ 2<\/strong><\/p>\n\n\n\n<p>where M = (P + Q) \/ 2 and || represents the concatenation operator<\/p>\n\n\n\n<p>A higher value of JS divergence indicates greater dissimilarity between distributions, while a value closer to zero indicates greater similarity. To illustrate with a concrete example, suppose we have two texts and we want to measure their similarity. Each text can be represented by a distribution of words, where each word is an element of the alphabet.<\/p>\n\n\n\n<p>By counting the occurrences of words in each text and normalising these occurrences by the frequency of the total words, we obtain the <a href=\"https:\/\/liora.io\/en\/understanding-kurtosis-calculating-outlier-frequency-in-statistical-distributions\">probability distributions P and Q.<\/a> We then use the Jensen-Shannon divergence formula to calculate a value that indicates how similar the two texts are!<\/p>\n\n\n\n<p>There are several important properties of this measure. Firstly, it is always positive and reaches zero if and only if the distributions P and Q are identical. In addition, it is upper bounded by log2(n), where n is the size of the alphabet of the distribution. It is statistically significant and can be used in hypothesis testing and confidence intervals.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-advantages-and-disadvantages\">Advantages and disadvantages<\/h2>\n\n\n\n<p>The strength of the <strong>Jensen-Shannon divergence<\/strong> is that it takes into account the overall structure of the distributions. It is therefore more resistant to local variations than other divergence measures. It is relatively efficient to calculate, which also makes it applicable to large quantities of data. These are its main advantages.<\/p>\n\n\n\n<p>On the other hand, it can be sensitive to sample size. Estimates of probability distributions can be unreliable when sample sizes are small, and this can affect the similarity measure. It may also be less suitable when the distributions are very different. This is because it does not capture the fine details of local differences.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-jensen-shannon-divergence-and-machine-learning\">Jensen-Shannon divergence and Machine Learning<\/h2>\n\n\n\n<p>JS divergence plays a crucial role in <a href=\"https:\/\/liora.io\/en\/boosting-business-with-3-essential-machine-learning-algorithms\">Machine Learning.<\/a> It measures the similarity between the probability distributions associated with different samples or clusters. It can be used to group similar data together or to classify new <strong>samples by comparing them with reference distributions.<\/strong><\/p>\n\n\n\n<p>In <a href=\"https:\/\/liora.io\/en\/natural-language-processing-definition-and-principles\">natural language processing,<\/a> it can be used to compare word distributions in different texts. This can be used to identify similar documents, detect duplicate content or find semantic relationships between texts. It is also a tool for evaluating <a href=\"https:\/\/liora.io\/en\/large-language-models-llm-everything-you-need-to-know\">language models.<\/a> In particular, it can be used to assess the diversity and quality of the texts generated.<\/p>\n\n\n\n<p>By comparing the probability distributions of the generated texts with those of the reference texts, it is possible to measure the extent to which the generations are similar to or different from the reference corpus. In cases where the training and test data come from different distributions, the<strong> Jensen-Shannon divergence<\/strong> can be used to guide domain adaptation strategies. This can help to adjust a model trained on a source distribution to better fit new data from a target distribution.<\/p>\n\n\n\n<p>Finally, for sentiment analysis, JS divergence can be used to compare profiles between different documents or sample classes. This allows similarities and differences in expression to be identified, for example for opinion detection or emotion classification.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\" style=\"margin-left:auto;margin-right:auto\"><img decoding=\"async\" width=\"800\" height=\"450\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1.png\" alt=\"\" class=\"wp-image-203170\" style=\"width:1000px;height:auto;display:block;margin-left:auto;margin-right:auto\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1.png 800w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-300x169.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-768x432.png 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-440x248.png 440w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-782x440.png 782w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-785x442.png 785w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-210x118.png 210w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/03\/divergence-jensen-shannon-1-114x64.png 114w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center wp-container-core-buttons-is-layout-a89b3969\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/liora.io\/en\/courses\/data-ai\/machine-learning-engineer\">Mastering JS and ML divergence<\/a><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-jensen-shannon-divergence-and-data-science\">Jensen-Shannon divergence and Data Science<\/h2>\n\n\n\n<p>For <a href=\"https:\/\/liora.io\/en\/synergies-unveiled-the-dynamic-intersection-of-data-science-and-finance\">data science,<\/a> JS divergence is used to compare the similarity between distributions of variables or characteristics in a data set. It can be used to measure the difference between observed data distributions and expected or reference distributions. This allows variations and discrepancies between different distributions to be identified, which can be valuable for detecting anomalies or validating hypotheses.<\/p>\n\n\n\n<p>For textual data analysis, this measure can be used to estimate the similarity between distributions of words, phrases or themes in documents. This can help to group similar documents, extract common topics or detect significant differences between sets of documents. For example, it can be used to classify documents based on their content or for sentiment analysis by comparing sentiment distributions between different texts.<\/p>\n\n\n\n<p>When there is high dimensionality in the data, Jensen-Shannon divergence can be used to select the most discriminating features or to reduce the dimensionality of the data. By calculating the divergence between the distributions of different characteristics, it is possible to identify those that contribute most to the differentiation between classes or groups of data.<\/p>\n\n\n\n<p>Model evaluation: In the process of developing and evaluating predictive models, JS divergence can be used as a metric to compare the probability distributions of predictions and actual values. This makes it possible to assess the quality of the model by measuring how closely the predictions match the actual observations. For example, it can be used to evaluate classification, regression or recommendation models.<\/p>\n\n\n\n<p>Finally, it can be used to measure the similarity between observations or instances in a dataset. By comparing the <strong>distributions of features between different instances<\/strong>, it is possible to determine the proximity or distance between them. This can be used in clustering tasks to group similar observations or to perform <a href=\"https:\/\/liora.io\/en\/database-what-is-it\">similarity searches in large databases.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion-jensen-shannon-divergence-a-key-tool-for-data-analysis-and-machine-learning\">Conclusion: Jensen-Shannon divergence, a key tool for data analysis and machine learning<\/h2>\n\n\n\n<p>Since its creation, the Jensen-Shannon divergence has been widely used in many fields, including computer science, statistics, natural language processing, bioinformatics and machine learning. It is still an essential tool for measuring similarity between probability distributions, and has opened up new perspectives in data analysis and statistical modelling. Researchers around the world use it to solve classification and clustering problems. It is a key element in the toolbox of scientists and practitioners in many fields, starting with Data Science.<\/p>\n\n\n\n<p>To learn how to master all the techniques of analysis and Machine Learning, you can choose Liora. Our courses will give you all the skills you need to become a data engineer, analyst, data scientist, data product manager or  ML engineer. You&#8217;ll learn about the Python language and its libraries, DataViz tools, Business Intelligence solutions and databases. All our courses are entirely distance learning, lead to professional certification and are eligible for funding options. Discover Liora!<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"Discover our Data Scientist training - DataScientest\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/kNPe_pgbuHg?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center wp-container-core-buttons-is-layout-a89b3969\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/liora.io\/en\/courses\/data-ai\/machine-learning-engineer\">Start a Machine Learning course<\/a><\/div>\n<\/div>\n\n\n\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is the Jensen\u2011Shannon divergence?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"The Jensen\u2011Shannon divergence is a symmetric measure of similarity between two probability distributions that takes the average of Kullback\u2011Leibler divergences between each distribution and their mean distribution, and is used to quantify how similar or different two distributions are.\u00a0:contentReference[oaicite:0]{index=0}\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How do you calculate the Jensen\u2011Shannon divergence?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"To calculate the Jensen\u2011Shannon divergence, you first obtain the probability distributions P and Q, compute their average distribution M=(P+Q)\/2, and then take the average of KL(P || M) and KL(Q || M) using JS(P||Q) = (KL(P||M) + KL(Q||M))\/2.\u00a0:contentReference[oaicite:1]{index=1}\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What are the advantages and disadvantages of Jensen\u2011Shannon divergence?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Its main advantages are symmetry and effectiveness in capturing overall distribution similarity, with efficient calculation even on large data sets; its disadvantages include sensitivity to sample size and less usefulness when distributions are highly different.\u00a0:contentReference[oaicite:2]{index=2}\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How is Jensen\u2011Shannon divergence used in Machine Learning?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"In Machine Learning, it is used to compare probability distributions of different samples or clusters, assess similarity between text distributions, evaluate language models, and aid domain adaptation when training and test data differ.\u00a0:contentReference[oaicite:3]{index=3}\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How is Jensen\u2011Shannon divergence relevant in Data Science?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"In Data Science, it helps measure differences between variable distributions, detect anomalies, select discriminative features, evaluate model predictions against actual values, and assist in clustering or similarity search.\u00a0:contentReference[oaicite:4]{index=4}\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is the conclusion about Jensen\u2011Shannon divergence?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"The Jensen\u2011Shannon divergence remains an essential tool for measuring similarity between probability distributions, widely used in fields like statistics, NLP, bioinformatics, and Machine Learning.\u00a0:contentReference[oaicite:5]{index=5}\"\n      }\n    }\n  ]\n}\n<\/script>\n\n","protected":false},"excerpt":{"rendered":"<p>The Jensen-Shannon divergence is used to measure the similarity between two probability distributions, particularly in the field of Machine Learning. Find out everything you need to know about this measure, from its history to its modern applications!<\/p>\n","protected":false},"author":47,"featured_media":180679,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-180678","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180678","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/47"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=180678"}],"version-history":[{"count":5,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180678\/revisions"}],"predecessor-version":[{"id":206543,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180678\/revisions\/206543"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/180679"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=180678"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=180678"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}