{"id":81805,"date":"2026-01-28T11:27:03","date_gmt":"2026-01-28T10:27:03","guid":{"rendered":"https:\/\/multi.liora.io\/?p=81805"},"modified":"2026-02-06T07:34:28","modified_gmt":"2026-02-06T06:34:28","slug":"hello-daniel-what-is-data-normalization","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/hello-daniel-what-is-data-normalization","title":{"rendered":"Data normalization: How this concept is related to Data Science?"},"content":{"rendered":"<b>Daniel is the technical support of Liora\u2019s trainings. It is the expert on every subject related to data science. Today, we have managed to get a quick interview with him, so that he can answer a few of our questions about data normalization.<\/b>\n<h2 class=\"wp-block-heading\" id=\"h-what-is-data-normalization\">What is Data Normalization?<\/h2>\n<b>Normalization<\/b>, as it is heard in the data science area, is a <b>very important concept<\/b> in Data pre-processing, when you need to work on a <a href=\"https:\/\/liora.io\/en\/machine-learning-what-is-it-and-why-does-it-change-the-world\"><b>Machine Learning<\/b><\/a> project.\n\nTwo main processes are implied when we talk about normalization: <b>normalization and standard normalization<\/b>, more commonly known as standardization. Generally, these two processes have the <b>same purpose<\/b>: to resize numerical variables so that they are <b>comparable on a common scale<\/b>.&nbsp;\n<h2 class=\"wp-block-heading\" id=\"h-in-mathematics-terms-what-do-we-have\">In mathematics terms, what do we have?<\/h2>\nLet\u2019s consider a <b>numerical variable<\/b> with n observations, than can be written as followed:\n\n<img decoding=\"async\" width=\"800\" height=\"145\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/With-1-1-1024x186.jpg\" alt=\"data-normalization\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/With-1-1-1024x186.jpg 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/With-1-1-300x55.jpg 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/With-1-1-768x140.jpg 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/With-1-1.jpg 1150w\" sizes=\"(max-width: 800px) 100vw, 800px\">\n\nAs we have a finite number of real values, we can extract various statistical pieces of information, including <b>min<\/b>, <b>max mean<\/b>, and <b>standard deviation<\/b>. The process of normalization only needs the min and max functions.\n\nThe purpose here is to <b>bring back all the values of the variable between 0 and 1<\/b> while keeping some distance between the values.&nbsp;\n\nTo do that, you\u2019ll use a simple formula:\n\n<img decoding=\"async\" width=\"429\" height=\"109\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-2.jpg\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-2.jpg 429w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-2-300x76.jpg 300w\" sizes=\"(max-width: 429px) 100vw, 429px\">\n\nRegarding the <b>standardization<\/b>, the transformation is more diffiuclt than easily bringing back the values between 0 and 1. It aims at bringing back <b>the average \u03bc to 0<\/b> and the <b>standard deviation to 1<\/b>.\n\nAgain, the process is not very complicated: if you already know the <b>mean \u03bc<\/b> and the <b>standard deviation \u03c3<\/b> of a variable X =&nbsp;x<sub>1<\/sub>&nbsp;x<sub>2<\/sub>&nbsp;x<sub>n<\/sub> you will write the <b>standardized variable<\/b> as followed:\n\n<img decoding=\"async\" width=\"434\" height=\"122\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-3.jpg\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-3.jpg 434w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/07\/formula-3-300x84.jpg 300w\" sizes=\"(max-width: 434px) 100vw, 434px\">\n<h2 class=\"wp-block-heading\" id=\"h-what-is-the-link-between-data-normalization-and-data-science\">What is the link between Data Normalization and Data Science?<\/h2>\nIn <a href=\"https:\/\/liora.io\/en\/data-science-definition-issues-and-use-cases\"><b>Data Science<\/b><\/a>, you\u2019re often dealing with <b>numerical data<\/b>, and you can rarely compare these data in their <b>original state<\/b>.\n\nWorking with variable scale data <b>can be a problem<\/b> in analysis because a numerical variable with a <b>range of values between 0 and 10,000<\/b> will be more important in the analysis than a variable with values between <b>0 and 1<\/b>, which would cause a<b> bias problem<\/b> later on.\n\nHowever, be careful not to consider normalization as a mandatory step in processing Data, it constitutes a<b> loss of information in the short term<\/b> and can be <b>detrimental<\/b> in certain cases!\n<h2 class=\"wp-block-heading\" id=\"h-how-do-you-normalize-data-concretely\">How do you normalize data concretely?<\/h2>\nWith <strong><a href=\"https:\/\/liora.io\/en\/python-the-most-popular-programming-language\">Python<\/a><\/strong> it is very simple, many libraries allow it. I will only mention <b>Scikit-learn<\/b> because it is the <b>most used in Data Science<\/b>. This library offers functions that perform the desired normalizations in a few simple lines of code.\n\nHowever, it is important to put the <b>use cases<\/b> in context, because in practice it is not enough to apply a silly normalization to all the Data we have when we already normalized our <b>training data<\/b>.\n\nWhy not? The reason is very simple: It is not possible to apply this same transformation to a <b>test sample, or new data<\/b>.\n\nIt is obviously possible to <b>center and reduce<\/b> any sample in the same way, but with an average and standard deviation that will be <b>different from those used on the training set<\/b>.\n\nThe results obtained would not be a fair representation of the <b>performance<\/b> of the model, when applied to new data.\n\nSo, rather than applying the normalization function directly, it is better to use a <b>Scikit-Learn feature called transformer API<\/b>, which will allow you to adjust (<b><i>fit<\/i><\/b>) a preprocessing step using the training data.\n\nSo when <b>normalization<\/b>, for example, is applied to other samples, it will use the same saved average and standard deviations.&nbsp;\n\nTo create this \u2018<b>adjusted<\/b>\u2019 preprocessing step, simply use the \u2018<b><i>StandardScaler<\/i><\/b>\u2019 function and adjust it using the training data. Finally, to apply it to an array of data afterward, simply apply the following formula: <b><i>scaler.transform()<\/i><\/b>.\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Discover our differents Data Science&#8217;s courses<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Daniel is the technical support of Liora\u2019s trainings. It is the expert on every subject related to data science. Today, we have managed to get a quick interview with him, so that he can answer a few of our questions about data normalization.<\/p>\n","protected":false},"author":85,"featured_media":30548,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-81805","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/81805","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/85"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=81805"}],"version-history":[{"count":3,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/81805\/revisions"}],"predecessor-version":[{"id":205420,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/81805\/revisions\/205420"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/30548"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=81805"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=81805"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}