{"id":178163,"date":"2024-03-01T15:12:31","date_gmt":"2024-03-01T14:12:31","guid":{"rendered":"https:\/\/liora.io\/en\/?p=178163"},"modified":"2026-02-06T08:28:15","modified_gmt":"2026-02-06T07:28:15","slug":"bagging-machine-learning-what-is-it-about","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/bagging-machine-learning-what-is-it-about","title":{"rendered":"Bagging Machine Learning: What is it about?"},"content":{"rendered":"<style>\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style><p><strong>Bagging Machine Learning: &#8220;Together we&#8217;re stronger&#8221; &#8211; bagging could be symbolised by this quote. In fact, this technique is one of the ensemble methods, which consists of considering a set of models in order to make the final decision. Let&#8217;s take a closer look at bagging.<\/strong><\/p>\t\t\n\t\t<p>First of all, we need to work on the data. We&#8217;re not going to provide all our models with the same data, because we want our models to be independent. The problem is that we can&#8217;t <a href=\"https:\/\/liora.io\/en\/what-is-a-dataset-how-do-i-work-with-it\">partition the dataset<\/a>. For a large number of models, our models would not be sufficiently trained and would <strong>produce mediocre results.<\/strong><\/p>\t\t\n\t\t\t<h3>The solution to our problems: bootstrapping<\/h3>\t\t\n\t\t<p>To overcome this problem, we &nbsp;<a href=\"https:\/\/liora.io\/en\/what-is-a-dataset-how-do-i-work-with-it\">create a bootstrap dataset.<\/a> With this method, we create a new dataset from the initial dataset. The new dataset is the same size as the initial one. Let n be the sample size and let E_1 be the initial dataset and E_2 the <a href=\"https:\/\/liora.io\/en\/excel-to-power-bi-how-to-transform-a-pivot-table-in-excel-into-a-dataset-that-can-be-used-by-power-bi\">bootstrapped dataset<\/a>.<\/p><p>From E_1, we will randomly select an individual and place it in E_2. We will repeat this step until E_2 is of size n. It&#8217;s important to note that the elements chosen are always in E_1, so we can select the same element several times. It is this feature that allows us to <strong>produce different data.<\/strong><\/p><p>Let&#8217;s take a look at a concrete example, using the same notation as above.<\/p>\t\t\n\t\t\t<style>\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"800\" height=\"439\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema_2-1.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema_2-1.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema_2-1-300x165.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema_2-1-768x422.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Find out more about Data Scientist training<\/a><\/div><\/div>\n\n\t\t<ul><li style=\"font-weight: 400;\" aria-level=\"1\"><p>Here E_1 is made up of 2 elements called A and 2 elements called B, E_2 is empty and n=4.<\/p><\/li><li style=\"font-weight: 400;\" aria-level=\"1\"><p>On the first iteration, we randomly select an element from E_1 and place B in E_2.<\/p><\/li><li style=\"font-weight: 400;\" aria-level=\"1\"><p>This process is repeated until E_2 has 4 elements.<br>Finally, E_2 is made up of 3 elements B and one element A. An element B has been selected 3 times.<\/p><p>Now that we&#8217;ve <strong>seen what the bootstrap<\/strong> is, we decide to make 3 bootstrap samples*. For each sample, we&#8217;ll assign a different model. Decision trees are commonly used. We train each model and make our predictions.<\/p><p>*the choice of 3 is arbitrary<\/p><\/li><\/ul>\t\t\n\t\t\t<h3>What can we do now with 3 predictions? <\/h3>\t\t\n\t\t<p>We need a single value for our problem, <a href=\"https:\/\/liora.io\/en\/distributed-architecture-definition-and-relationship-to-big-data\">and that&#8217;s where we do data aggregation.<\/a> For<a href=\"https:\/\/liora.io\/en\/classification-algorithms-definition-and-main-models\"> classification models<\/a>, we will apply a voting system and for regression models, we will average the predicted value. Referring to the initial quote, we build a final model from several models.<\/p><p>We can see an example of a voting system where we have to <a href=\"https:\/\/liora.io\/en\/predictive-elevator-maintenance-challenges-and-technology\">predict A or B,<\/a> with the example below:<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"800\" height=\"439\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema3.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema3.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema3-300x165.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/05\/illu_bagging-schema3-768x422.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>To understand what bagging is, we can remember that it is a combination of bootstrapping and aggregating. A famous bagging algorithm is the random forest, which you can find here. You can also find out more about another <a href=\"https:\/\/liora.io\/en\/catboost-an-essential-machine-learning-tool\">well-known aggregation technique, boosting, in this article.<\/a><\/p><p><strong>Bagging = Bootstrap + aggregating<\/strong><\/p><p>One of the advantages of this method is that it can reduce variance. Even if the models are not trained on the same data set, the bootstrap samples <strong>share common tuples,<\/strong> which produces bias. This bias reduces the variance.<\/p><p>A second advantage is to correct prediction errors. Let&#8217;s look at the image below. Each model makes a <a href=\"https:\/\/liora.io\/en\/management-of-unbalanced-classification-problems-i\">classification prediction error,<\/a> but each error is corrected by the voting system.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"800\" height=\"439\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/06\/illu_bagging_schema3-24.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/06\/illu_bagging_schema3-24.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/06\/illu_bagging_schema3-24-300x165.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/06\/illu_bagging_schema3-24-768x422.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>We have seen how&nbsp;<strong>Bagging in Machine Learning<\/strong> fits in with ensemble techniques. As stated in the introduction, we use several models whose predictions we aggregate to obtain a final prediction. To calibrate our different models, we create different data sets by bootstrapping on the initial sample.<\/p><p>If you want to put this into practice, don&#8217;t hesitate to sign up for our <a href=\"\/en\/courses\/data-ai\/data-scientist\">Data Scientist training course.<\/a><\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/data-scientist\">Discover the Data Scientist course<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Bagging Machine Learning: &#8220;Together we&#8217;re stronger&#8221; &#8211; bagging could be symbolised by this quote. In fact, this technique is one of the ensemble methods, which consists of considering a set of models in order to make the final decision. Let&#8217;s take a closer look at bagging. First of all, we need to work on the [&hellip;]<\/p>\n","protected":false},"author":76,"featured_media":178166,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-178163","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178163","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/76"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=178163"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178163\/revisions"}],"predecessor-version":[{"id":206030,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178163\/revisions\/206030"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/178166"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=178163"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=178163"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}