{"id":168210,"date":"2026-02-18T06:38:06","date_gmt":"2026-02-18T05:38:06","guid":{"rendered":"https:\/\/liora.io\/en\/?p=168210"},"modified":"2026-02-18T06:38:06","modified_gmt":"2026-02-18T05:38:06","slug":"pca-principal-component-analysis-what-is-it","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/pca-principal-component-analysis-what-is-it","title":{"rendered":"PCA (Principal Component Analysis): What is it?"},"content":{"rendered":"<p><strong>Do you know the PCA? A very useful method used in dimension reduction, discover how it works in this article.<\/strong><\/p>\n<h2>What is the Principal Component Analysis?<\/h2>\nWho has never had in his hands a dataset containing a very large number of variables without knowing which are the most important?&nbsp; How to reduce this dataset to represent it simply on 2 or 3 axes? Here is the PCA!\n\nPrincipal Component Analysis answers these questions. PCA is a well-known <b>method of dimension reduction<\/b> that will allow the transformation of <b>highly correlated variables<\/b> into new variables that are decorrelated from each other.&nbsp;\n\nThe principle is simple: It is a matter of <b>summarizing the information <\/b>contained in a large <a href=\"https:\/\/liora.io\/en\/database-what-is-it\">database<\/a> into a certain number of synthetic variables called: Principal Components.&nbsp;\n\nThe idea is then to be able to project these data on the <b>nearest hyperplane<\/b> to have a simple representation of our data.\n\nOf course, <b>dimension reduction means a loss of information<\/b>. This is the challenge of a Principal Component Analysis. We must be able to reduce the dimension of our data while keeping a maximum of information.\n<h3>How does a Principal Component Analysis work?<\/h3>\nTo illustrate the principle of PCA, we will<b> take for example a dataset <\/b><img decoding=\"async\" width=\"800\" height=\"246\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/05\/ACP_1.webp\" alt=\"ACP_1\" loading=\"lazy\">\n\nThereafter it is important to <b>center and reduce our variables<\/b> to mitigate the scale effect because they are not calculated on the same basis.\n\nOnce this step has been completed, we must see our data as a <b>matrix<\/b> from which we will calculate from which we will <b>calculate eigenvalues<\/b> and <b>eigenvectors<\/b>.\n\nIn linear algebra, the notion of eigenvector corresponds to the <b>study of privileged axes<\/b>, according to which an application of a space in itself behaves like a dilation, multiplying the vectors by a constant called an eigenvalue. The vectors to which it applies are called eigenvectors,<b> combined in an eigenspace.<\/b>\n\nAfter importing the PCA module from sklearn.decomposition, the eigenvalues returned are&nbsp;\n\nThe eigenvalues are: [3.48753851 1.47902877 1.15061758 0.93557048 0.65529084 0.15140052]\n\nThese eigenvalues will allow us to <b>determine the optimal number of factors\/principal components<\/b> for our PCA. For example, if the optimal number of components is 2, then our data will be represented on<b> two axes<\/b>, and so on.\n\n<img decoding=\"async\" width=\"800\" height=\"572\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/05\/ACP_3-3-1024x732.jpg\" alt=\"\" loading=\"lazy\">\n\nOn this graph which represents the number of factors to choose from according to the eigenvalues, we indicate that the <b>optimal choice of factor <\/b>is 2 (thanks to the elbow method). Thus, we will go from dimension 9 to dimension 2 which considerably<b> reduces the basic dimension<\/b>. As said before, there will necessarily be a<b> loss of information<\/b> following this reduction. However, we still keep a rate of information of almost 70% which will allow us to have a representation close to my representation in 9 dimensions.\n\nOnce the PCA module has <b>calculated the coordinates of our data<\/b>, all that remains to be done is to represent them, but before doing so, we are going to<b> take an interest in a tool<\/b> that is very often used when performing a Principal Component Analysis, namely the <b>correlation circle<\/b>.\n\n<img decoding=\"async\" width=\"512\" height=\"474\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/05\/ACP_4.jpg\" alt=\"ACP_4\" loading=\"lazy\">\n\nAs our representation is done on 2 axes, the correlation circle is a<b> practical tool <\/b>that allows us to<b> visualize the importance of each explanatory<\/b> variable for each representation axis. The direction of each arrow indicates the axis explained by the variable and the direction indicates whether the correlation is positive or negative.\n\nWe notice that <b>variables<\/b> such as &#8216;income&#8217;, &#8216;gdpp&#8217;, and &#8216;health&#8217; are positively correlated to the first axis, while &#8216;child_mort&#8217; or &#8216;total_fer&#8217; are also <b>positively correlated<\/b> but negatively. We can then look at the representation of countries in the two axes chosen by the PCA and see the influence of the variable &#8216;life_expec&#8217; on their representations.\n\n<img decoding=\"async\" width=\"512\" height=\"493\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/05\/ACP_5.jpg\" alt=\"ACP_5\" loading=\"lazy\">\n\nHere is a <b>representation of each country<\/b> (167) on 2 axes. To judge the quality of our representation, we decided to color each country according to the life expectancy of each one in 3 groups, we can observe a certain trend. We can then notice that the countries with a high life expectancy are concentrated in the <b>lower right part<\/b> of the graph. According to the <b>correlation circle<\/b>, the individuals in this part are partly explained by the variables &#8216;health&#8217;, &#8216;income&#8217;, or &#8216;gdb&#8217;. It can be concluded that <b>countries spending more on health have a higher life expectancy<\/b>. The same is true for the countries in the upper left part of the graph. According to the correlation circle, this part is mostly explained by the variables &#8216;child_death&#8217; or &#8216;total_iron&#8217;.\n\nIf you want to learn more about <b>Principal Component Analysis<\/b> or other <b>dimension reduction methods<\/b>, there are several modules dedicated to it in <a href=\"\/en\/courses\/data-ai\/data-analyst\">our Data Analyst training<\/a>.\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Discover our Data Science courses<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Do you know the PCA? A very useful method used in dimension reduction, discover how it works in this article. What is the Principal Component Analysis? Who has never had in his hands a dataset containing a very large number of variables without knowing which are the most important?&nbsp; How to reduce this dataset to [&hellip;]<\/p>\n","protected":false},"author":74,"featured_media":207098,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-168210","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168210","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/74"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=168210"}],"version-history":[{"count":2,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168210\/revisions"}],"predecessor-version":[{"id":207099,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168210\/revisions\/207099"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/207098"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=168210"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=168210"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}