Good Java Books to buy from Amazon to Learn Java programming

Good Books to buy from Amazon and Learn Java, Please Click link below to order them to read.

These 3 books i would recommend if you want to learn Java and to become master in Java programming.

  • Effective Java
  • Head first Java
  • Java Complete Reference


This week i taught my kids about Fibonacci sequence… my pedagogy to my Kids.. I follow different approach…

It reminds me when i was a kid, My illiterate Dad used to laugh at me whenever i try to knot the Cow, i usually use spring roll type knot, My Dad ties the cow with special Knot ( & ) which i could not do it, Latter i became a computer engineer and i remember my Dad while I use & on my keyboard.

Flower with Fibonacci sequence of Petals, Every time i teach them Math, i usually make them understand with real life example, My little daughter found a pattern (n-1) + (n-2), she was saying that she understood the pattern and i told her each mathematical problem will have a pattern and you need to use it to solve that problem.

Since i am a computer engineer, i showed her how to solve this type of problem with Recursion, Every problem can have a same solution but you can solve it in different ways.

Next I like to teach them COMPOUNDING INTEREST.


Big Data Data Pipeline

We live in a Big Data world, where we generate huge amount of Big Data in terms of Giga/Tera/Peta/Exa..extra Bytes through various IOT channels using devices on different platforms and It is essential for Industry to pipe this Big Data  from different sources into Distributed data Storage to perform

  • Data Analytics and then visualize the Data as dashboards or

  • This Data can also be useful to  find the patterns on the data using some machine learning / AI algorithms in the field of Data Science and Predictive analytics.

Big Data pipeline is  architected using lot of tools such as Flumes, SyslogNG and Messaging systems such as Active MQ, Kafka…etc, The Data that gets ingested through pipeline would use some parsers  to parse this data and finally gets stored into Distributed or No sql storages.

Since I do consulting with many organizations, the real world Data pipeline that i have set it up so far for different clients would look like something similar to the picture which I have depicted below.

Building the Data pipeline

Myself being a Sr. Big Data engineer, I have built so many Big Data pipelines for many customers, to whomever I built the data pipeline, i have used the below tools and technologies to setup the Big Data pipeline for the Big Data lakes.

Syslog NG: using this tool  I collect the delta data from different data sources and it bundles all of the Data from different sources as a single package and It transports those packages to other channels such as flume or Kafka, this tool has a pattern as following patterns

  • Input : it has the source information

  • Filter: it filters the logs or necessary information using regex patterns

  • Output: it is the destination where we want to send the data to.

Flume: is another Data pipeline channel, which streams the data from one source and then it sinks it to destination, the destination can be Distributed file system or Nosql database.

Kafka: is a messaging system, which stores the data as partitions of a topic and it works based on the publisher and subscriber model.

Spark: It is widely used tool to parse and transform the Big data and it can also be used to store the data into Hadoop distributed file system.

Elastic Search, Logstash & Kibana: this is nosql database, where we store the data into elastic search as indices, this data can easily be indexed quickly and can also be visualized on Kibana as Dashboards.

HDFS: this is used for long term Archival for Batch analytics.

Real use of Big Data Storage 

When it comes to Big Data storage, Data can be stored on either  Distributed file system (Hadoop Distributed File System – HDFS ), NoSql , Graph Databases such as ( Elastic search, Cassandra, MongoDB and Neo4J…etc ) as unstructured or semi structured Data.

This Data collection is mainly used by Data Scientists to do Machine Learning or Predictive analytics, this can also be used to visualize as Dashboards.

This long term archival or storage data can also be used to do Machine Learning or Deep Learning and it also can be used for AI as well.

  • Model Building

  • Model Scoring

I have also seen at some companies, they store the data into Graph DB to solve some of the  Discrete mathematics related problems, one such example would be storing the Operational data into Graph DB and then visualizing the Network of Nodes or machines and their edges as links between nodes. By doing this people can easily figure out the easiest way of detecting the failure nodes from active nodes or machines.

Every week Saturday and Sunday I like to improve my Skills by learning new stuff, This time I see the Docker makes my work easy and makes me SMART.
usually I use Virtual Machines for learning new technologies and end up with lot of issues with my Laptop.

Issues could be Installation takes lot of time or lack of resources especially RAM, Storage and i need an OS and so…on But Docker is so quick and downloads the image that i want, for example if i want to install Elastic search or Java JDK or Cloudera i can quickly pull that from DOCKER HUB and starting Building My own images and pushing my applications to Docker Containers and running them inside Containers.

Soon i will be blogging about DOCKER SECURITY, DOCKER NETWORK and so …on.


  • Install the Docker on you Machine, You can download the Docker from here :
  • Once we install Docker, we can pull the Elastic search Image from docker hub.

How to pull the Docker Image from Docker hub ?

  • docker pull [ dockerImage name from Docker hub ]
Once we pull the image from docker hub to our local machine, we can see that using the below command.

How to list all the Docker images in your system after pulling the images to your local system?

  • docker images
How to run Docker image or container?

  • docker run -ti –name imagename imageId bash
  • docker run -d –name imagename imageId

for terminal interactive use -ti or -it followed by image Id

if you wan to run the docker image at the background or detach mode, you can use the following command

How to view whether Docker container is running or not?

  • docker ps
How to stop the Docker Container?

  • docker container rm -f elasticsearch:  this commands stops the container and removes the container forcibly.
How to view the Docker container running stats?

  • docker container stats imagename/containername


During the Long week end, I had some time to play around with Elastic Search and trying to use Machine Learning Capability, May be do some Anomaly Detection..
I am taking little bit of deviation from Spark Machine Learning and Scala and will come back again once i am done with this.

Elastic Search Use cases

  • Text Search
  • Suggestions
  • Log Analysis

Log Analysis can help us to do the following

  • Security Analysis
  • Performance Analysis
  • Predictive Analysis

Elastic Search Cluster

Elastic Search Cluster Tuning


Coming soon…. I may come up with a small and general use case with Machine Learning [Anomaly Detection ] and how to effectively use DSL Queries on Nested and Parent & Child Documents.

Log stash

Multi Pipeline Data Inget using Log-stash

Coming soon…. Thinking to ingest some data form Kafka to Elastic Search using log stash and some data to HDFS using web module.

Elastic Search API’s

  • cluster APIs,
  • indices APIs,
  • document APIs,
  • CAT APIs, and search APIs.

Coming soon….

Here I will be blogging about tools that are related to DevOps Tools Automation.

  • Ansible
  • Jinja 2
  • Chef
  • Puffet
  • Unix OS System level Programming

I will be blogging about in-depth concepts such as

  • Process
  • Threads
  • Inter process threads
  • Socket Communication
  • Unix OS level System calls. …etc

Learn Big Data

Big Data is a buzz word in the current real world of technology, Big Data helps us to process huge amount of data on distributed platform called Hadoop, Nowadays all of the top IT companies have started using Big Data to explore

  • Data Science or Machine learning
  • Big Data Analytics 
  • Big Data Reporting.

Big Data Job Opportunities

If you learn Big Data, you will get so many job opportunities to work as 

  • Big Data Architect
  • Big Data Analyst
  • Big Data Engineer
  • Big Data Developer

Big Data Technologies

Most popular tools are 

  • Hadoop
  • Map Reduce
  • Spark
  • Kafka
  • Hive
  • Pig
  • Sqoop
  • Oozie
  • Unix shell Scripting

Big Data Programming Languages

Programming languages are useful, when we do Data preparation and Data Transformation.

  • Java
  • Scala
  • Python

Spark Tutorial

How to use Apache Spark

How to use Apache Spark with HIVE

In this section you will learn how to use Apache SPARK with HIVE.

How to Create Spark Data Frame & Data Set

How to use Spark with Relational Data Bases

the following tutorials will help you to avoid SQOOP therefore you can directly work with Oracle data using Spark.

How Apache spark is used in Real Time Scenario

HBase Spark and Java

Kafka Consumer using Scala Spark Streaming

Kafka Producer using Java

Calling Machine Learning API’s using Scala

Java Code & Python Code

Apache Hive

Hive Window, Over, Analytic Function

Real Time Working Scenarios

Apache Pig Programming

Real Time Working Scenarios

My Personal Books Library

Interview Preparation and Questions for Big Data Engineer, Hadoop Developer and Data Scientists Jobs.

Leave a comment

Your email address will not be published. Required fields are marked *

19 thoughts on “”