{"id":180793,"date":"2026-02-18T06:45:37","date_gmt":"2026-02-18T05:45:37","guid":{"rendered":"https:\/\/liora.io\/en\/?p=180793"},"modified":"2026-02-18T06:45:38","modified_gmt":"2026-02-18T05:45:38","slug":"kolmogorov-smirnov-test-understanding-this-statistical-method","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/kolmogorov-smirnov-test-understanding-this-statistical-method","title":{"rendered":"Kolmogorov-Smirnov Test: Understanding this Statistical Method"},"content":{"rendered":"<p><strong>The Kolmogorov-Smirnov test is a widely used method for comparing data. Discover the amazing story of its invention, and how it is used today in Data Science!<\/strong><\/p>\n<!-- \/wp:post-content -->\n\n<!-- wp:paragraph -->\n<p>In 1933, Andrei Kolmogorov published an article entitled &#8220;Sulla determinazione empirica di una legge di distribuzione&#8221; (On the empirical determination of a distribution law). In it, the mathematician presented the notion of <strong>empirical cumulative distribution (ECD)<\/strong> and the corresponding test statistic. He was interested in how data could be compared with a theoretical distribution without assuming a specific form for the distribution.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>His method was based on the maximum difference between the <strong>DCE and the DCT (theoretical cumulative distribution)<\/strong>, and he proposed a test statistic to quantify this difference. A few years later, in 1939, Nikolai Smirnov developed a similar approach completely by accident, in his article <strong>&#8220;Estimation of the difference between empirical distribution functions in two independent samples&#8221;.<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>His aim was also to propose a non-parametric method for comparing two independent samples of data. For his part, he proposed defining a test statistic based on the <a href=\"https:\/\/liora.io\/en\/data-quality-10-mistakes-not-to-make\">maximum difference between the two EDFs of the data samples.<\/a> It was only at a mathematics conference that Kolmogorov and Smirnov met by chance. As they began to discuss their respective research, they realised that they were working on similar problems independently.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>As they exchanged ideas and results, they were astonished to realise that their methods and formulas were extremely similar. Surprised by this coincidence, the two mathematicians decided to work together to develop a common approach. They combined their ideas and expertise to create the <strong>&#8220;Kolmogorov-Smirnov Test&#8221;.<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h2 id=\"h-what-is-the-kolmogorov-smirnov-test\" class=\"wp-block-heading\">What is the Kolmogorov-Smirnov test?<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Used in many fields, the<em> Kolomogorov-Smirnov<\/em> test is a powerful statistical tool. It is used to assess the similarity between an empirical distribution and a theoretical distribution, or to compare two distributions. It is based on two key concepts: the ECD and the CTD. The ECD is the empirical cumulative distribution. It is constructed from observed data, and represents the proportion of observations less than or equal to a given value.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>The TCD, on the other hand, is the theoretical cumulative distribution. It is based on a theoretical distribution specified by the user. The aim of the test is to measure the maximum distance (test statistic D) between the TCD and the TCD. D is calculated by taking the absolute value of the greatest difference between the two cumulative distributions. The higher its value, the <strong>greater the difference between the empirical distribution and the theoretical distribution.<\/strong><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:image {\"width\":\"auto\",\"height\":\"500px\",\"align\":\"center\"} -->\n\n<!-- \/wp:image -->\n\n<!-- wp:buttons {\"className\":\"is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\",\"style\":{\"spacing\":{\"margin\":{\"top\":\"var:preset|spacing|columns\",\"bottom\":\"var:preset|spacing|columns\"}}},\"layout\":{\"type\":\"flex\",\"justifyContent\":\"center\"}} -->\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\" style=\"margin-top:var(--wp--preset--spacing--columns);margin-bottom:var(--wp--preset--spacing--columns)\"><!-- wp:button -->\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/liora.io\/en\/courses\/\">Learn how to use the Kolmogorov-Smirnov test<\/a><\/div>\n<!-- \/wp:button --><\/div>\n<!-- \/wp:buttons -->\n\n<!-- wp:paragraph -->\n<p>To assess the <strong>significance of the test<\/strong>, a P value is calculated. It represents the probability of obtaining a value of D that is as extreme or more extreme than the one observed. In addition, the null hypothesis states that the two distributions are identical and the alternative hypothesis suggests that there is a significant difference between the two.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>This test can be used with a single sample to check whether the distribution follows a specific distribution, or with two independent samples to compare two different distributions. If the <strong>P value is above a predefined significance level,<\/strong> the null hypothesis is verified. If it is lower, this proves that there is a difference and that the two distributions are incompatible.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h3 id=\"h-what-is-the-purpose-of-the-kolmogorov-smirnov-test\" class=\"wp-block-heading\">What is the purpose of the Kolmogorov-Smirnov test?<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>The <strong>Kolmogorov-Smirnov<\/strong> test is used in many fields, including the social sciences, economics, biology, physics, engineering and many others. One of the most common applications is to assess the normality of a distribution. An <strong>empirical distribution<\/strong> is compared with a theoretical normal distribution to check whether the data show any significant deviations.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>This method can also be used to determine whether two independent samples come from the same population, or whether they differ significantly. This is very useful in comparative studies, controlled experiments or group analyses. It is also used to check the adequacy of a statistical model. The aim is to check whether the model fitted to the data faithfully reproduces the distribution observed. If it does not, potential gaps or errors can be identified.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>It is therefore a highly versatile tool for <a href=\"https:\/\/liora.io\/en\/data-science-bootcamp-definition-training-benefits\">Data Science<\/a>, <a href=\"https:\/\/liora.io\/en\/unraveling-machine-learning-vs-deep-learning-key-differences-explained\">Machine Learning<\/a> and <a href=\"https:\/\/liora.io\/en\/artificial-intelligence-definition\">Artificial Intelligence<\/a>. It is used not only to compare model performance, but also for feature selection and anomaly detection.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h2 id=\"h-conclusion\" class=\"wp-block-heading\">Conclusion<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>As well as highlighting the importance of encounters and chance in major scientific discoveries, this anecdote gave rise to a tool that is still widely used today to analyse data reliably. To learn all the methods and tools of Data Science, Liora is the place to be. Our various training courses enable you to acquire all the skills needed to become a Data Analyst, Data Engineer, Data Scientist, Machine Learning Engineer or Data Product Manager.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>In particular, you will learn about the Python language and its libraries, DataViz, Business Intelligence, data analysis and machine learning. All our programmes can be completed entirely by distance learning, and our state-recognised organisation is eligible for funding options. Thanks to a partnership with <strong>MINES ParisTech,<\/strong> learners receive certification at the end of the course. Discover Liora!<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:embed {\"url\":\"https:\/\/youtu.be\/pVNzVvHV1No\",\"type\":\"video\",\"providerNameSlug\":\"youtube\",\"responsive\":true,\"className\":\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"} -->\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n  <div class=\"wp-block-embed__wrapper\">\n<iframe title=\"Les diff\u00e9rents m\u00e9tiers en Data Science - DataScientest\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/pVNzVvHV1No?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n  <\/div>\n<\/figure>\n<!-- \/wp:embed -->\n\n<!-- wp:buttons {\"className\":\"is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\",\"layout\":{\"type\":\"flex\",\"justifyContent\":\"center\"}} -->\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><!-- wp:button -->\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/liora.io\/en\/courses\/\">Find out more about our training courses<\/a><\/div>\n<!-- \/wp:button --><\/div>\n<!-- \/wp:buttons -->\n\n<!-- wp:html -->\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is the Kolmogorov\u2011Smirnov test?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"The Kolmogorov\u2011Smirnov test is a powerful statistical tool used to assess the similarity between an empirical distribution and a theoretical distribution, or to compare two distributions by measuring the maximum distance between their cumulative distribution functions.\u00a0([turn2view1])\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What does the Kolmogorov\u2011Smirnov test measure?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"It measures the maximum distance (statistic D) between the empirical cumulative distribution function and the theoretical cumulative distribution function or between two empirical distributions, with higher D indicating greater difference.\u00a0([turn2view1])\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How is statistical significance assessed in the Kolmogorov\u2011Smirnov test?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"A p\u2011value is calculated representing the probability of obtaining a D value as extreme as observed under the null hypothesis that the distributions are identical; a low p\u2011value suggests a significant difference.\u00a0([turn2view1])\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"When can the Kolmogorov\u2011Smirnov test be used?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"The test can be used with a single sample to check if it follows a specific distribution, or with two independent samples to compare whether their distributions differ significantly.\u00a0([turn2view1])\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is the purpose of the Kolmogorov\u2011Smirnov test?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"It\u2019s used across many fields \u2014 like social sciences, biology, economics and machine learning \u2014 to assess normality, compare samples, check model fit, and help with tasks such as feature selection and anomaly detection.\u00a0([turn2view1])\"\n      }\n    }\n  ]\n}\n<\/script>\n\n<!-- \/wp:html -->","protected":false},"excerpt":{"rendered":"<p><strong>The Kolmogorov-Smirnov test is a widely used method for comparing data. Discover the amazing story of its invention, and how it is used today in Data Science!<\/strong><\/p>\n","protected":false},"author":93,"featured_media":207106,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-180793","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/93"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=180793"}],"version-history":[{"count":5,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180793\/revisions"}],"predecessor-version":[{"id":207107,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180793\/revisions\/207107"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/207106"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=180793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=180793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}