Aggregation

Description

Aggregation performs a selected calculation on a set of values and returns a single group value.

If grouping by a tuple type field, the only supported aggregation method is count.

Using the aggregation function in a Flow may cause the additional partitions resulting in multiple output files. 

Use

  • Select an argument to group by. This creates groups from identical data values from the selected field.
  • Enter a name for the output field name.
  • Additional arguments can be added by clicking the Add Group by button. (Optional)

(Optional)

  • Select an aggregation method.
    • Any - Returns a random value from the selected field in the group.
    • Concat - Returns a string of the concatenated values in a group.
    • Count - Returns the count of the number of field values of a group.
    • First - Returns the first value in the field from the group.
    • Last - Returns the last value in the field from the group.
    • List - Returns a list of the concatenated values in a group.
    • Sum - Returns the sum value of a field in a group.
  • Fill in the configuration fields for the selected aggregation method.
  • Additional aggregations can be added by clicking the Add Aggregation button. (Optional)
  • Click OK.

Example

Input Data

customer_idfirst_namelast_nameitem_
1001AnneSwiftbook
2002MarkusGurwoodtoy
3003KellyEpplejacket
4004AnneMaindsphone
4004AnneMaindsbike
5005JackHenzejacket
2002MarkusGurwoodbook
6006VictorBennikebike
1001AnneSwifttelevision
3003KellyEpplebook
5005JackHenzebook
5005JackHenzeshoes
1001AnneSwiftphone
5005JackHenzebike
1001AnneSwiftshoes

In this example, you want to use the group by unique customers and have a field containing the items the purchased in list format.

Connect the input fields to the aggregation function.

Select the field customer_id to group by and enter a output field name.

Before clicking OK:

The values in the customer_id field are grouped together and the new output field would display the following:

grouped_customer_id
1001
2002
3003
4004
5005
6006

Now that the unique customers have been grouped, the objective is to add a field that contains all their purchases in a list value.

Select List for the aggregation type.

Select the item_ field for the input values.

Enter an output field name.

Select the sorting order. In this example, the sorting order field item_ was selected so that the list of products each customer purchased are arranged alphabetically. 

Click OK to save the function.

The results for this function are as follows:

grouped_customer_idproduct_list
1001[book,phone,shoes,television] 
2002[book,toy_car] 
3003[book,jacket] 
4004[bike,phone] 
5005[bike,book,jacket,shoes] 
6006[bike] 

Error Handling

There are two types of error handling for the Aggregation function:

  • When the error is in the Groupby (key) field.
  • When the error is in the set of values (data) field. 

Errors in the groupby field

If the Aggregation functions finds an error in the groupby field, each error becomes a seperate group.

Example:

You are grouping on the department field and compute the average age:

NameDepartmentAge
Jeff<error>52
AaronIT40
Clair<error>35
RobertIT50

the results are:

DepartmentAverage_Age
IT45
<error>52
<error>35

Errors in the set of values field

Example:

You are grouping on the department field and compute the average age:

NameDepartmentAge
JeffMarketing52
AaronIT<error>
ClairSales35
RobertIT50

the results are:

DepartmentAverage_Age
IT<error>
Marketing52
Sales35

Type

Aggregation 

Nested Fields

Nested fields are supported for both the group by and aggregation features of this function.