Unflatten creates groups based on tuples in a field and then can perform a selected calculation on a set of values.
It is similar to the Aggregation function but the Unflatten function works on specific partitions of input data rather than the entirety of the input data.
Using the Unflatten function in a Flow may cause the additional partitions resulting in multiple output files.
- Select an argument to group by. This creates groups from identical data values from the selected field of each partition.
- Enter a name for the output field name.
- Additional arguments can be added by clicking the Add Group by button. (Optional)
- Select an aggregation method.
- Any - Returns a random value from the selected field in the group.
- Concat - Returns a string of the concatenated values in a group.
- Count - Returns the count of the number of field values of a group.
- First - Returns the first value in the field from the group.
- Last - Returns the last value in the field from the group.
- List - Returns a list of the concatenated values in a group.
- Sum - Returns the sum value of a field in a group.
- Fill in the configuration fields for the selected aggregation method.
- Additional aggregations can be added by clicking the Add Aggregation button. (Optional)
- Click OK.
In this example there is an Input with multiple files (partitions).
When the Input is added to Flow Studio, the same field from each partition is put together. The Unflatten function can be used to split the groups by partition and in this example, count the number of tuples in each partition.
Input file 1:
Input file 2:
Select and configure the Unflatten function.
The preview data for the Unflatten function shows that the groups have been split by partition and the number of tuples counted.