{"id":180152,"date":"2024-06-05T23:25:04","date_gmt":"2024-06-05T22:25:04","guid":{"rendered":"https:\/\/liora.io\/en\/?p=180152"},"modified":"2026-02-06T07:58:55","modified_gmt":"2026-02-06T06:58:55","slug":"apache-hive-hadoop-sql-for-decision-making","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/apache-hive-hadoop-sql-for-decision-making","title":{"rendered":"Apache Hive Hadoop: SQL for decision-making"},"content":{"rendered":"<style>\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style><p><strong>The open-source framework of the leading Big Data platform, Hadoop, is ideal for storing and processing massive quantities of data. However, when it comes to data extraction, this platform is often complex, time-consuming and costly. That&#8217;s why the Apache Foundation has developed a new alternative. It&#8217;s called Apache Hive.<\/strong><\/p>\t\t\n\t\t<p>As a reminder, in computer programming, a framework designates a coherent set of structural software components used to create the foundations and architecture of a software program.<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/data-engineer\">Learn how to use Apache Hive<\/a><\/div><\/div>\n\n\t\t\t<h3>Apache Hive &#8211; what is it?<\/h3>\t\t\n\t\t<p><strong>Apache Hive<\/strong> is an open source datawarehouse for Hadoop. A <a href=\"https:\/\/liora.io\/en\/data-warehouse-2\">data warehouse<\/a> functions as a central repository where information comes from one or more data sources. It collects data from a variety of heterogeneous sources, with the main aim of supporting analysis and querying in a language <a href=\"https:\/\/liora.io\/en\/sql-queries-the-5-most-important-commands-to-know\">syntactically close to SQL, and facilitating the decision-making process.<\/a><\/p><p>&nbsp;<\/p><p>?Related articles:<\/p><table dir=\"ltr\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\" data-sheets-root=\"1\"><colgroup><col width=\"656\"><\/colgroup><tbody><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Apache Ambari: A tool to simplify Hadoop cluster management&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/apache-ambari-a-tool-to-simplify-hadoop-cluster-management\"><a href=\"https:\/\/liora.io\/en\/apache-ambari-a-tool-to-simplify-hadoop-cluster-management\" target=\"_blank\" rel=\"noopener\">Apache Ambari: A tool to simplify Hadoop cluster management<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Apache Storm: Explanations and Use cases &quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/apache-storm-explanations-and-use-cases\"><a href=\"https:\/\/liora.io\/en\/apache-storm-explanations-and-use-cases\" target=\"_blank\" rel=\"noopener\">Apache Storm: Explanations and Use cases <\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Understanding Apache Flume: Its Purpose and Applications&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/understanding-apache-flume-its-purpose-and-applications\"><a href=\"https:\/\/liora.io\/en\/understanding-apache-flume-its-purpose-and-applications\" target=\"_blank\" rel=\"noopener\">Understanding Apache Flume: Its Purpose and Applications<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;DevOps training: how to master GitHub, Docker or Apache Airflow?&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/devops-training-how-to-master-github-docker-or-apache-airflow\"><a href=\"https:\/\/liora.io\/en\/devops-training-how-to-master-github-docker-or-apache-airflow\" target=\"_blank\" rel=\"noopener\">DevOps training: how to master GitHub, Docker or Apache Airflow?<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Apache Ant: The Basics&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/apache-ant-the-basics\"><a href=\"https:\/\/liora.io\/en\/apache-ant-the-basics\" target=\"_blank\" rel=\"noopener\">Apache Ant: The Basics<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Apache Airflow: A Comprehensive Guide to Workflow Orchestration&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/apache-airflow-a-comprehensive-guide-to-workflow-orchestration\"><a href=\"https:\/\/liora.io\/en\/apache-airflow-a-comprehensive-guide-to-workflow-orchestration\" target=\"_blank\" rel=\"noopener\">Apache Airflow: A Comprehensive Guide to Workflow Orchestration<\/a><\/td><\/tr><tr><td data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Apache Spark: Understanding its Functions and Benefits&quot;}\" data-sheets-hyperlink=\"https:\/\/liora.io\/en\/apache-spark-its-functions-and-benefits\"><a href=\"https:\/\/liora.io\/en\/apache-spark-its-functions-and-benefits\" target=\"_blank\" rel=\"noopener\">Apache Spark: Understanding its Functions and Benefits<\/a><\/td><\/tr><\/tbody><\/table>\t\t\n\t\t\t<style>\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"800\" height=\"439\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-1024x562.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-1024x562.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-300x165.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-768x422.png 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-1536x843.png 1536w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2021\/04\/schema_hadoop-116-2048x1124.png 2048w\" sizes=\"(max-width: 800px) 100vw, 800px\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t<h3>How does Apache Hive work?<\/h3>\t\t\n\t\t<p><strong>Apache Hive<\/strong> translates programs written in HiveQL (a language close to SQL) into one or more Java MapReduce, Apache Tez or Apache Spark jobs. These are the three runtime engines that can be launched on Hadoop. Apache Hive then organizes the data into an array for the <a href=\"https:\/\/liora.io\/en\/decoding-hdfs-unveiling-the-core-of-hadoop-distributed-file-system\">Hadoop Distributed Filed System (HDFS) file, and runs the tasks on a cluster to produce an answer.<\/a><\/p><p>Apache Hive arrays are similar to those of a relational database, with data units organized from the largest to the most granular. Databases are made up of tables composed of partitions, which can again be broken down into &#8220;buckets&#8221;.<\/p><p>Data can be accessed via HiveQL. Within each database, data is numbered, and each array corresponds to an <strong>HDFS directory.<\/strong><\/p><p>Within the Apache Hive architecture, multiple interfaces are available, including web, CLI and external client interfaces. Indeed, the &#8220;Apache Hive Thrift&#8221; server enables remote clients to submit commands and requests to Apache Hive using a variety of programming languages. Apache Hive&#8217;s central directory is a &#8220;metastore&#8221; containing all information.<\/p><p>The engine that makes<strong> Hive work<\/strong> is called the &#8220;driver&#8221;. It includes a compiler and an optimizer to determine the best execution plan, as well as an executor.<\/p><p>Finally, security is provided by <a href=\"https:\/\/liora.io\/en\/hadoop-spark-training-how-to-learn-how-to-handle-big-data-tools\">Hadoop.<\/a> It relies on Kerberos for mutual authentication between client and server. Permissions for newly created files in Apache Hive are dictated by HDFS, which allows authorization by user, group or other criterion.<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/data-engineer\">Training to become a Data Engineer<\/a><\/div><\/div>\n\n\t\t\t<h3>What are the advantages of using Apache Hive?\n<\/h3>\t\t\n\t\t<p><strong>Apache Hive<\/strong> is an ideal solution for data queries and analysis. It enables you to obtain qualitative information (&#8220;insights&#8221;), giving you a competitive edge and facilitating responsiveness to market demand.<\/p><p>Apache Hive&#8217;s main advantages include its ease of use, thanks to its <strong>SQL-friendly language<\/strong>. What&#8217;s more, this software speeds up initial data insertion, as data does not need to be read or numbered on disk in the database&#8217;s internal format.<\/p><p>Since the data is stored in HDFS, it is possible to store large datasets of up to hundreds of petabytes of data on Apache Hive. In fact, this solution is far more scalable than a traditional database. As a Cloud service, Apache Hive enables users to rapidly launch virtual servers as workloads fluctuate.<\/p><p>Security is also a priority, with the ability to replicate critical workloads for recovery in the event of a problem. Finally, workload capacity is second to none, with up to 100,000 requests per hour.<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Find out more about our training courses<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>The open-source framework of the leading Big Data platform, Hadoop, is ideal for storing and processing massive quantities of data. However, when it comes to data extraction, this platform is often complex, time-consuming and costly. That&#8217;s why the Apache Foundation has developed a new alternative. It&#8217;s called Apache Hive. As a reminder, in computer programming, [&hellip;]<\/p>\n","protected":false},"author":76,"featured_media":180157,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-180152","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/76"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=180152"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180152\/revisions"}],"predecessor-version":[{"id":205705,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/180152\/revisions\/205705"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/180157"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=180152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=180152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}