Configure a Flow
You can configure a Flow from the Flow overview page.
To configure a Flow:
- From the Flow overview page, click the gear icon in the Settings column next to the Flow you want to configure.
- The configuration settings allow you to set:
- Flow name - Edit the name that identifies the Flow.
- Flow template - Select a template from the drop-down list.
- Number of partitions - Enter the number of partitions associated with the Flow. A larger number of allocated partitions can process larger quantiles of data.
- Partitions per node - Enter the number of nodes used per partition associated with the Flow. The number of partitions per node effects parallel processing performance.
- Maximum retries - Enter the number of times a node tries to process a record before indicating the Flow has failed.
- Speculative cushion - Enter a value to allocate a slightly larger percentage of nodes than what you need to process a larger job in case of node malfunction.
- Working folder - Enter the folder where records are stored while they are being processed.
- Execution mode - Select Strict if every record must be processed. Select Forgiving if incomplete records can be skipped.
- When complete, click SAVE.
Configure your Flow based on a previously built template.
Number of Partitions
Partitions are used to logically or physically group data into segments that can be distributed for processing. When you allocate a flow for processing, you can increase this number if you are processing a large volume of records. For example, if you need to process 1000 records, you could partition it into 2 partitions of 500 or 10 partitions of 100.
Partitions per Node
The number of partitions per node effects the parallel processing performance. A node can have at least one or more partitions. When you increase the partitions allocated per node, more items can be processed at the same time.
Maximum retries indicates the number of times the node will try to process a record before it fails. The default of retries is 3 before a record in a flow fails. If you are using strict execution mode, the flow fails. If you are using the forgiving setting, the record is written to a separate file.
Use this setting to allocate a slightly larger percentage of nodes than what you need to process a larger job. This cushion is used if one or more nodes has functional issues. It is not needed for smaller jobs.
If you want to break 1000 invoices into 10 batches of 100 and use 1 partition per node, you need 10 nodes available.
If there are 20 nodes in the cluster, but you only want to use 10, you can set a percentage when you allocate nodes to get a little more than needed because some nodes in the cluster might not be functional. To process the 1000 invoices and partition them into groups of 100, you create 10 partitions. So, if you allocate 12 nodes and 1 or 2 aren't functioning correctly, you still have 10 nodes to process the job.
This folder is where records are stored while they are being processed.
Strict vs forgiving specifies how strict you are about accepting input data and how strict you are about about every record being processed. Examples of bad input data include values that may be missing, out of range, or malformed.
Strict: Use with invoices or in any case where it is critical that all records get processed.
Forgiving: Use this setting if records can be ignored. For example, if you can't generate a list of leads from every record in the Input data, you can safely skip the incomplete records and process the completed ones. Hero_Flow writes the records that it can't process to a folder.
Bleed corrupted data
If the forgiving execution mode is selected, a check box is offered to bleed corrupted data. When corrupted data is found, the data bled out into the working folder and the Flow continues to run.