Aggregation Function

Description

GroupBy

The Group by feature groups rows that have the same values into summary rows. The unique values from the selected field is the group key. 

If grouping by a "Tuple" data type field, the only supported aggregation method is count.

Aggregation

Aggregation performs a selected operation on the values in the rows based on the group key. It returns a combined values in the rows for the group.


The aggregation in a Flow is global and processes all the input data as a whole.

If it is needed to perform group by aggregation on input data per partition, see Unflatten.

Use

  • Select a field to group by. This creates groups from identical data values from the selected field.
  • Enter a name for the output field name.
  • Additional fields can be added by clicking the Add Group by button. (Optional)

(Optional)

  • Select an aggregation method.
    • Any - Returns one value from the selected field in the group.
    • Concat - Returns a string of the joined values in a group
      • Separated by a the delimiter character(s) entered in the configuration.
    • Count - Returns the number of rows of a group.
    • First - Returns the first value in the field from the group.
    • Last - Returns the last value in the field from the group.
    • List - Returns all the values for the group in a list as separate elements. 
    • Sum - Returns the sum of all values in the group added together.
  • Fill in the configuration fields for the selected aggregation method.
    • Example: If the aggregation method asks for a Sort Order, a common field used for sorting is a field containing a time stamp.
  • Additional aggregations can be added by clicking the Add Aggregation button. (Optional)
  • Click OK.

Example

Input Data

customer_idfirst_namelast_nameitem_
1001AnneSwiftbook
2002MarkusGurwoodtoy
3003KellyEpplejacket
4004AnneMaindsphone
4004AnneMaindsbike
5005JackHenzejacket
2002MarkusGurwoodbook
6006VictorBennikebike
1001AnneSwifttelevision
3003KellyEpplebook
5005JackHenzebook
5005JackHenzeshoes
1001AnneSwiftphone
5005JackHenzebike
1001AnneSwiftshoes

In this example, you want to use the group by unique customers and have a field containing the items the purchased in list format.

Connect the input fields to the aggregation function.

Select the field customer_id to group by and enter a output field name.

Before clicking OK:

The values in the customer_id field are grouped together and the new output field would display the following:

grouped_customer_id
1001
2002
3003
4004
5005
6006

Now that the unique customers have been grouped, the objective is to add a field that contains all their purchases in a list value.

Select List for the aggregation type.

Select the item_ field for the input values.

Enter an output field name.

Select the sorting order. In this example, the sorting order field item_ was selected so that the list of products each customer purchased are arranged alphabetically. 

Click OK to save the function.

The results for this function are as follows:

grouped_customer_idproduct_list
1001[book,phone,shoes,television] 
2002[book,toy_car] 
3003[book,jacket] 
4004[bike,phone] 
5005[bike,book,jacket,shoes] 
6006[bike] 

Error Handling

There are two types of error handling for the Aggregation function:

  • When the error is in the Groupby (key) field.
  • When the error is in the set of values (data) field. 

Errors in the groupby field

If the Aggregation functions finds an error in the groupby field, each error becomes a seperate group.

Example:

You are grouping on the department field and compute the average age:

NameDepartmentAge
Jeff<error>52
AaronIT40
Clair<error>35
RobertIT50

the results are:

DepartmentAverage_Age
IT45
<error>52
<error>35

Errors in the set of values field

Example:

You are grouping on the department field and compute the average age:

NameDepartmentAge
JeffMarketing52
AaronIT<error>
ClairSales35
RobertIT50

the results are:

DepartmentAverage_Age
IT<error>
Marketing52
Sales35

Type

Aggregation Function 

Nested Fields

Nested fields are supported for both the group by and aggregation features of this function.