- ORC stands for Optimized Row Columnar Format.
- It is used to achieve higher compression rate and better query optimization.
Its very easy to create ORC table from existing NON-ORC table that has already Data in it.We will see how to practice this with step by step instructions.
Creating NON-ORC Table
First let us Non-ORC table as STUDENT, It is easy that we no need to specify that this table is ORC,
by default all the tables that we create are non-orc tables.
CREATE TABLE STUDENT
PARTITIONED BY (COUNTRY STRING)
CLUSTERED BY (STD_GRADE) INTO 3 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
LOAD DATA from local file system to Non-orc Table
Using LOAD DATA INPATH hiveql, we can load the data into non orc table.
LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/student.txt' OVERWRITE INTO TABLE STUDENT
CREATING ORC TABLE using non-ORC table
Using SELECT hiveql we can create orc table from existing non-orc table as mentioned below.
create table studentORC
stored as orc
as select * from student;
Execution of above query would be something like below
hive> CREATE TABLE STUDENTORC
> STORED AS ORC
> AS SELECT * FROM STUDENT;
Query ID = cloudera_20150712113535_010445b0-5069-492e-85fc-b02188a0b7d0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1436716181772_0002, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1436716181772_0002/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1436716181772_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-12 11:35:46,808 Stage-1 map = 0%, reduce = 0%
2015-07-12 11:35:54,654 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.35 sec
MapReduce Total cumulative CPU time: 1 seconds 350 msec
Ended Job = job_1436716181772_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/.hive-staging_hive_2015-07-12_11-35-35_913_6677605876738776781-1/-ext-10001
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/studentorc
Table default.studentorc stats: [numFiles=1, numRows=5, totalSize=575, rawDataSize=1345]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.35 sec HDFS Read: 3581 HDFS Write: 651 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 350 msec
Time taken: 20.101 seconds