{"id":177360,"date":"2024-01-21T18:09:43","date_gmt":"2024-01-21T17:09:43","guid":{"rendered":"https:\/\/liora.io\/en\/?p=177360"},"modified":"2026-02-06T08:33:25","modified_gmt":"2026-02-06T07:33:25","slug":"amazon-emr-a-cluster-management-tool-managed-by-aws","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/amazon-emr-a-cluster-management-tool-managed-by-aws","title":{"rendered":"Amazon EMR: A cluster management tool managed by AWS"},"content":{"rendered":"<p><strong>Amazon EMR (Elastic MapReduce) is a data processing service managed by Amazon Web Service (AWS). It enables the management of large amounts of data, in the petabyte range, using popular tools such as Apache Hadoop, Hive, Spark and HBase, to name but a few.<\/strong><\/p>\t\t\n\t\t<p><strong>Amazon EMR<\/strong> has been designed to offer great flexibility and scalability, enabling users to achieve very fast results <strong>using powerful, highly configurable calculation clusters.<\/strong><\/p>\t\t\n\t\t\t<h3>Understanding how Amazon EMR works<\/h3>\t\t\n\t\t<p><strong>Amazon EMR<\/strong> works by creating data processing clusters that are configured to meet the specific needs of each task. These clusters are created according to the computing and storage resources required.<\/p><p>A cluster is made up of nodes of different types:<\/p><ul><li><strong>Master Node:<\/strong> manages the cluster and its resources. As the primary node, it orchestrates<a href=\"https:\/\/liora.io\/en\/apache-oozie-simplify-the-management-of-your-big-data-workflows\"> data processing tasks.<\/a><\/li><li>It also <strong>stores cluster metadata<\/strong> and provides a command line interface (CLI) and a Web interface for interacting with the cluster.<\/li><li><strong>Core nodes:<\/strong> managed by the primary node, they coordinate data storage in a file system such as HDFS. In addition, they execute parallel processing tasks.<\/li><li><strong>Task nodes:<\/strong> these are optional and are used to increase the capacity of data-parallel processing tasks, such as MapReduce or Spark jobs. However, they do not <a href=\"https:\/\/liora.io\/en\/decoding-hdfs-unveiling-the-core-of-hadoop-distributed-file-system\">store data on the HDFS.<\/a><\/li><\/ul>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image3.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>To <strong>provide processing and storage capacity,<\/strong> <a href=\"https:\/\/liora.io\/en\/aws-ec2-how-to-use-amazons-tool\">EMR uses EC2 (Elastic Compute Cloud) instances<\/a>. These instances are virtual machines that can be highly configured and adapted as required.<\/p><p>When the <strong>EMR cluster is created,<\/strong> the necessary tools are automatically installed on each node of the cluster (tools such as Hadoop, Spark or Hive come to mind). Scheduling and execution of processing tasks are handled by managers such as <a href=\"https:\/\/liora.io\/en\/exploring-yarn-a-robust-alternative-to-npm-for-package-management\"><strong>YARN (the best-known) or Mesos.<\/strong><\/a><\/p><p>As AWS services integrate particularly well with each other, data sources such as Amazon S3, <a href=\"https:\/\/liora.io\/en\/amazon-relational-database-service-rds-what-is-it\">RDS<\/a> or DynamoDB can be used to enable processing by EMR. In the same spirit of integration, Amazon Cloudwatch is used to monitor cluster performance and availability.<\/p>\t\t\n\t\t\t<h3>Is Amazon EMR complicated to implement?<\/h3>\t\t\n\t\t<p>Installing and implementing Amazon EMR is a relatively straightforward process that can be completed in just a few steps. The prerequisite is, of course, an AWS account.<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/cloud-dev\/aws-solutions-architect\">Learn how to use Amazon EMR<\/a><\/div><\/div>\n\n\t\t<p>Une fois connect\u00e9 \u00e0 votre compte, il vous suffit de s\u00e9lectionner le service EMR.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image4.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>Choose the highlighted &#8220;Create a cluster&#8221; button<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image5.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>Then follow the steps to create a cluster according to your needs. Here&#8217;s a summary of EC2 instance types:<\/p>\t\t\n\t\t\t<style type=\"text\/css\">\n.tg  {border-collapse:collapse;border-spacing:0;}\n.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;\n  overflow:hidden;padding:10px 5px;word-break:normal;}\n.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;\n  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}\n.tg .tg-62vj{background-color:#9b9b9b;color:#efefef;font-family:Tahoma, Geneva, sans-serif !important;font-size:16px;\n  font-weight:bold;text-align:center;vertical-align:top}\n.tg .tg-o36d{font-family:Tahoma, Geneva, sans-serif !important;text-align:center;vertical-align:top}\n<\/style>\n<table style=\"undefined;table-layout: fixed; width: 600px\">\n<colgroup>\n<col style=\"width: 200px\">\n<col style=\"width: 200px\">\n<col style=\"width: 200px\">\n<\/colgroup>\n<thead>\n  <tr>\n    <th>Instance Class<\/th>\n    <th>Instance Family<\/th>\n    <th>Recommended Use<\/th>\n  <\/tr>\n<\/thead>\n<tbody>\n  <tr>\n    <td>General Purpose<\/td>\n    <td>M4, M5<\/td>\n    <td>Batch Processing<\/td>\n  <\/tr>\n  <tr>\n    <td>Compute Optimized<\/td>\n    <td>C5,C4<\/td>\n    <td>Machine Learning<\/td>\n  <\/tr>\n  <tr>\n    <td>Memory Optimized<\/td>\n    <td>X1,X4<\/td>\n    <td>Interactive Analysis<\/td>\n  <\/tr>\n  <tr>\n    <td>Storage Optimized<\/td>\n    <td>D2, I3<\/td>\n    <td>Large-Scale HDFS<\/td>\n  <\/tr>\n<\/tbody>\n<\/table>\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image7.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>Once the cluster has been created, all that&#8217;s left to do is run and deploy data processing applications. Beware, however, of pricing.<\/p>\t\t\n\t\t\t<h3>Price list<\/h3>\t\t\n\t\t<p>The costs associated with using <strong>Amazon EMR<\/strong> may vary from region to region. In addition, AWS EMR charges both for its own instance and for EC2 instances. Billing is per second, with a minimum charge of one minute. Find out more about Amazon&#8217;s pricing policy for this service.<\/p>\t\t\n\t\t\t<h3>Case studies<\/h3>\t\t\n\t\t<p>Let&#8217;s take a look at two case studies where AWS EMR provides the answer to data processing problems.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image6.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t<p>An online platform for restaurant reviews (among others), they turned to EMR for large-scale, real-time comment processing and analysis. Thanks to its use, Yelp can now obtain detailed analyses of trends. As the company&#8217;s needs fluctuate greatly, Yelp can now adapt its processing capacity to meet them.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2023\/05\/image2.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t<p>As a real estate company (based in the USA), they have opted to use EMR to deploy real estate forecasting algorithms on a very large scale. This enables them to process real estate data quickly and efficiently, providing their customers with more accurate price trends, as well as monitoring variations in this highly volatile market in real time.<\/p>\t\t\n\t\t\t<h3>Conclusion<\/h3>\t\t\n\t\t<p>As you will have gathered from this article, Amazon EMR is a powerful and flexible cloud solution for large-scale data processing. Thanks to its ease of use and ability to integrate with other AWS services, it&#8217;s a first choice solution for companies needing high-performance data analysis to make the right decisions and adapt to changing market needs.<\/p><p>&nbsp;<\/p><p>?Related articles:<\/p><table dir=\"ltr\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\" data-sheets-root=\"1\"><colgroup><col width=\"656\"><\/colgroup><tbody><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;AWS Elastic Load Balancer: The solution that distributes network traffic&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/aws-elastic-load-balancer-the-solution-that-distributes-network-traffic\"><a href=\"https:\/\/liora.io\/en\/aws-elastic-load-balancer-the-solution-that-distributes-network-traffic\" target=\"_blank\" rel=\"noopener\">AWS Elastic Load Balancer: The solution that distributes network traffic<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Jam AWS: The playful Amazon learning platform&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/jam-aws-the-playful-learning-platform-from-amazon\"><a href=\"https:\/\/liora.io\/en\/jam-aws-the-playful-learning-platform-from-amazon\" target=\"_blank\" rel=\"noopener\">Jam AWS: The playful Amazon learning platform<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;AWS Lambda: Introduction to the Serverless Function&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/aws-lambda-introduction-to-the-serverless-function\"><a href=\"https:\/\/liora.io\/en\/aws-lambda-introduction-to-the-serverless-function\" target=\"_blank\" rel=\"noopener\">AWS Lambda: Introduction to the Serverless Function<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;AWS Certification: What is it and how do I get it? &quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/aws-certification-what-is-it-and-how-do-i-get-it\"><a href=\"https:\/\/liora.io\/en\/aws-certification-what-is-it-and-how-do-i-get-it\" target=\"_blank\" rel=\"noopener\">AWS Certification: What is it and how do I get it? <\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;AWS SageMaker: A guide for using the platform&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/aws-sagemaker-a-guide-for-using-the-platform\"><a href=\"https:\/\/liora.io\/en\/aws-sagemaker-a-guide-for-using-the-platform\" target=\"_blank\" rel=\"noopener\">AWS SageMaker: A guide for using the platform<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;5 AWS launches and announcements making developers\u2019 life easy in 2022&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/5-aws-launches-and-announcements-making-developers-life-easy-in-2022\"><a href=\"https:\/\/liora.io\/en\/5-aws-launches-and-announcements-making-developers-life-easy-in-2022\" target=\"_blank\" rel=\"noopener\">5 AWS launches and announcements making developers\u2019 life easy in 2022<\/a><\/td><\/tr><\/tbody><\/table>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/formation\/cloud-dev\/aws-solutions-architect\">Training on Amazon AWS<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Amazon EMR (Elastic MapReduce) is a data processing service managed by Amazon Web Service (AWS). It enables the management of large amounts of data, in the petabyte range, using popular tools such as Apache Hadoop, Hive, Spark and HBase, to name but a few. Amazon EMR has been designed to offer great flexibility and scalability, [&hellip;]<\/p>\n","protected":false},"author":76,"featured_media":177365,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-177360","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/177360","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/76"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=177360"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/177360\/revisions"}],"predecessor-version":[{"id":206085,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/177360\/revisions\/206085"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/177365"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=177360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=177360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}