Aggregation Function
Description
GroupBy
The Group by feature groups rows that have the same values into summary rows. The unique values from the selected field is the group key.
If grouping by a "Tuple" data type field, the only supported aggregation method is count.
Aggregation
Aggregation performs a selected operation on the values in the rows based on the group key. It returns a combined values in the rows for the group.
The aggregation in a Flow is global and processes all the input data as a whole.
If it is needed to perform group by aggregation on input data per partition, see Unflatten.
Use
- Select a field to group by. This creates groups from identical data values from the selected field.
- Enter a name for the output field name.
- Additional fields can be added by clicking the Add Group by button. (Optional)
(Optional)
- Select an aggregation method.
- Any - Returns one value from the selected field in the group.
- Concat - Returns a string of the joined values in a group
- Separated by a the delimiter character(s) entered in the configuration.
- Count - Returns the number of rows of a group.
- First - Returns the first value in the field from the group.
- Last - Returns the last value in the field from the group.
- List - Returns all the values for the group in a list as separate elements.
- Sum - Returns the sum of all values in the group added together.
- The group must be a numeric data type.
- Fill in the configuration fields for the selected aggregation method.
- Example: If the aggregation method asks for a Sort Order, a common field used for sorting is a field containing a time stamp.
- Additional aggregations can be added by clicking the Add Aggregation button. (Optional)
- Click OK.
Example
Input Data
customer_id | first_name | last_name | item_ |
---|---|---|---|
1001 | Anne | Swift | book |
2002 | Markus | Gurwood | toy |
3003 | Kelly | Epple | jacket |
4004 | Anne | Mainds | phone |
4004 | Anne | Mainds | bike |
5005 | Jack | Henze | jacket |
2002 | Markus | Gurwood | book |
6006 | Victor | Bennike | bike |
1001 | Anne | Swift | television |
3003 | Kelly | Epple | book |
5005 | Jack | Henze | book |
5005 | Jack | Henze | shoes |
1001 | Anne | Swift | phone |
5005 | Jack | Henze | bike |
1001 | Anne | Swift | shoes |
In this example, you want to use the group by unique customers and have a field containing the items the purchased in list format.
Connect the input fields to the aggregation function.
Select the field customer_id to group by and enter a output field name.
Before clicking OK:
The values in the customer_id field are grouped together and the new output field would display the following:
grouped_customer_id |
---|
1001 |
2002 |
3003 |
4004 |
5005 |
6006 |
Now that the unique customers have been grouped, the objective is to add a field that contains all their purchases in a list value.
Select List for the aggregation type.
Select the item_ field for the input values.
Enter an output field name.
Select the sorting order. In this example, the sorting order field item_ was selected so that the list of products each customer purchased are arranged alphabetically.
Click OK to save the function.
The results for this function are as follows:
grouped_customer_id | product_list |
---|---|
1001 | [book,phone,shoes,television] |
2002 | [book,toy_car] |
3003 | [book,jacket] |
4004 | [bike,phone] |
5005 | [bike,book,jacket,shoes] |
6006 | [bike] |
Error Handling
There are two types of error handling for the Aggregation function:
- When the error is in the Groupby (key) field.
- When the error is in the set of values (data) field.
Errors in the groupby field
If the Aggregation functions finds an error in the groupby field, each error becomes a seperate group.
Example:
You are grouping on the department field and compute the average age:
Name | Department | Age |
---|---|---|
Jeff | <error> | 52 |
Aaron | IT | 40 |
Clair | <error> | 35 |
Robert | IT | 50 |
the results are:
Department | Average_Age |
---|---|
IT | 45 |
<error> | 52 |
<error> | 35 |
Errors in the set of values field
Example:
You are grouping on the department field and compute the average age:
Name | Department | Age |
---|---|---|
Jeff | Marketing | 52 |
Aaron | IT | <error> |
Clair | Sales | 35 |
Robert | IT | 50 |
the results are:
Department | Average_Age |
---|---|
IT | <error> |
Marketing | 52 |
Sales | 35 |
Type
Nested Fields
Nested fields are supported for both the group by and aggregation features of this function.