Apache Hive Hadoop Tutorial

To read this Hive Tutorial, it would be easy if you know SQL otherwise go through this SQL Tutorial

If you want to be Big Data or Hadoop Developer, Hive programming is very important to learn.

What is Apache Hive?

  • Apache Hive is a Query Language.
  • It works on Hadoop Distribute File System(HDFS).
  • It is similar to SQL Query and Queries can be saved with the file extension called .hql

What is the use of Hive in Hadoop?

In a real world it is mainly used for

  • Big Data Analysis on huge data sets,
  • Ad-hoc querying on command line interface for Generating reports for Big data analytic’s,
  • Creating Dash Boards for visual representation of Big Data sets.

If you know or if you had worked before on RDBMS SQL, Hive will be easy to understand because

  • Big data can be stored as Tables on Hive Data Base and
  • It’s table schema details are stored in Meta store.

Big data Dash Board visualization and Reports can be generated on hive Data warehouse tables using Tools like Tableau, SAS..etc

How to use HIVE on Hadoop?

Let us  get started with Command Line Interface(CLI).

If you are using any of these Software Distributions such as Cloudera/Hortonworks/MapR VM , It is very easy we just need to type the name and press enter button.

Using Editors(VIM/geditor)

You can open a vim/gedit editor and type simple Query.

let us create a simple Query and then we execute and see the output.

It will open up a Text editor where you can type queries and then you can issue the following command.

By specifying & at the end , once after opening the editor it allows us to use the same terminal session for some other purpose.

If you want to come out from the CLI command shell window, type quit;

Tip: if you press shift + ctrl + N on existing command shell window, it will open up another new shell windo

Using Tez

  • It is derived from Hindi Word, Which  means “Fast”.
  • SQL Engine that works with YARN to improve performance of Query Execution.
  • Tez minimizes the Map Operations and there by it decreases I/O Operations Overhead.
  • Tez Jobs are executed based on DAG (Direct A cyclic Graph)model.

How to enable the Tez Execution Engine?

We need to type the following statement on Command Line Interface (CLI)

It always suggested to use Tez Execution engine for executing the QUERIES.

Casual Classroom Video Tutorial