Pig Foreach Generate

If we want to generate Data based on only specific set of Columns then we should go FOREACH…GENERATE operator in pig, It is similar to SELECT in SQL. Basically It helps to do transformation on Pig Relation.

PigForeach

Pig FOREACH can be used in tow ways:

  • Simple FOREACH…GENERATE
  • Nested FOREACH {…GENERATE  };

Simple FOREACH…GENERATE:  

This is simple to use the FOREACH…GENERATE in a single line of pig statement to generate the columns that we need.

Syntax:

Let us take an example:

If Employee_Details is an outer bag then we can use the following
EmployeeDetails.txt that has the sample data.

If we want to project or select all the columns in the above relation called Employee_Details than we need to use *

Output:

If we want to generate or select only Name and Salary Columns, we can either give the names or  use $0, $1, $2 to project the values from starting to end.
Here we can use either $1,$2  or  Name, Salary


Nested FOREACH:

Inside Nested FOREACH, we can only use pig operators such as DISTINCT, FILTER, LIMIT, ORDER and SAMPLE.

Syntax:

Let us take an Example:

Download this file into your Unix Box where you have installed Hadoop and Upload them in to HDFS using Employee_Details.txt.

Output: