Let us see over here the pragmatic approach of using Hadoop (HDFS- Hadoop Distribution File System) as a solution for storing Big Data in an Enterprise Data warehouse.
In the below video I have demonstrated with the sample Use Case scenario on how to use Big Data Hadoop Distributed File system in the context of Enterprise Data warehouse.
I have depicted a simple Enterprise Data warehouse in a diagram below where you will find
- Data sources
- Enterprise Data warehouse acting as Data Hub
- Data Visualization
Here i have shown about how to use Hadoop Distributed file system for storing the Big data.
We can store any type of Big data into Hadoop distributed file system.
Big Data can be classified as
- Structured Data: this might be the data from Relational Data bases(Example : Oracle, MySQL, Closed Mainframe DB2 data..).
- Semi Structured Data: The Data in the form of XML or JSON data which will have the data in a embedded special characters or language tags, I will explain this when i write a post about No SQL databases.
- Un Structured Data: this data might be something who would be typing with all the special characters or it could be junk data from the users.
In this post i have used source data as Structured Data from Mysql Data bases
Data Source: retail_db database tables that have structure data from Mysql Database.
Use case Scenario: My manager asks me to get Top 10 Revenue products.
- Sqoop (Sql for Hadoop) is used as Bulk load from Data Source to HDFS.
- Hive: this was developed by Facebook to perform Online Analytical Processing(OLAP) Queries
- Hue: it is a UI tool which will act as HDFS online storage along with Hadoop Eco system tools.
- BI tools: you may use any BI tool that you want to use to generate Dashboards ( Ex: Tableau), but i have used Hue inbuilt Charts.
For ETL: we can use PIG, HIVE Scripts to perform ETL work.
Hope this would help you to understand how can we use this Hadoop in the context of enterprise Data warehouse.