PIG Aggregate Functions

1 comment

The following Aggregate Function we can use while performing the ad-hoc analysis using Pig Programming

  • MAX(Column_Name)
  • MIN(Column_Name)
  • COUNT(Column_Name)
  • AVG(Column_Name)

Note:

  • All the Aggregate functions are With Capital letters.
  • If we want to perform Aggregate operation we need to use GROUP BY first and then we have to use Pig Aggregate function.

Use the following .csv file to practice and see some of the use cases given below using these Aggregate functions.

Place this Products.csv file that contains the below data into HDFS default folder path ( For Example : /user/cloudera/Products.csv)

Product_Name,Store_ID,Year,NoofProducts
BathSoap,101,2001,5
cupcake,102,2001,30
peanuts,101,2001,100
computer,103,2011,40
tablet,103,2011,100
bread,102,2004,80
oil,101,2011,2

MAX(Column_Name)

If you are asked to Find the Maximum Products sold by each store, We need use the following Pig Script.

Output:

(101,100.0)
(102,80.0)
(103,100.0)


MIN(Column_Name)

If you are asked to Find the Minimum Products sold by each store, We need use the following Pig Script.

Output:

(101,2.0)
(102,30.0)
(103,40.0)


COUNT(Column_Name)

If we want to Count No of Products sold by each store:

Output:

(101,3)
(102,2)
(103,2)


SUM(Column_Name)

If we want to find the TOTAL Number of Products sold by each store.

Output:

(101,107.0)
(102,110.0)
(103,140.0)


AVG(Column_Name)

If we want find the Average Number of Products sold by each store.

Output:

(101,35.666666666666664)
(102,55.0)
(103,70.0)

Class Room Video  

Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “PIG Aggregate Functions”