Create a File System Input
File system Connection types:
- Local File System
Create an Input for File System Data in Hero_Flow
- Open Hero_Flow.
- Click Inputs from the menu on the left.
- Click Create New Input.
- Enter a name for the File System Input and select a File System Connection from the drop-down list.
- Configure the File System Input data.
Enter the file path to the data.
- Pattern mapping characters:
- "?" matches one character.
- "*" matches zero or more characters.
- "**" matches zero or more directories in a path.
- folder/t?st.txt - matches folder/test.txt but also folder/tast.txt or folder/txst.txt
- folder/*.txt - matches all .txt files in the folder directory.
- folder/**/test.txt - matches all test.txt files underneath the folder path.
- folder/subfolder/subfolder2/**/*.txt - matches all .txt files underneath the folder/subfolder/subfolder2 path.
- folder/**/subfolder/bla.txt - matches folder/f2/f3/subfolder/bla.txt but also org/f2/f3/f4/subfolder/bla.txt and folder/subfolder/bla.txt
When using wildcard characters with SSH: SCP and SFTP both require an absolute path. E.g. /home/ubuntu/docs/**/*.pdf
Be aware of AWS S3 costs when using wildcards patterns.
Example using the file path: folder/f2/**/subfolder/a.txt
- Hero_Flow will list the objects in folder/f2.
- The folder structure inside folder/f2 doesn't really count against S3 costs. Hero_Flow has a single query and as many page requests as needed.
- Hero_Flow will load the resulting object names (+meta data) in batches. One batch will have multiple object names and metadata.
- Hero_Flow will filter the results in the memory for: **/subfolder/a.txt
- Hero_Flow downloads only the matching objects at a later time during the Flow execution.
The AWS cost waste comes from the number of the filtered out files.
If 0% of the files are filtered out, then no waste.
If 50% of the files are filtered out on Hero_Flow's side, then in theory with server side filtering, half of the page request could have been spared.
Recommendations for using wild card patters with S3:
Have a well organized folder structure.
- Try and use wildcards close to the end of the expression when possible.
If using the expression folder/*.txt, ensure folder doesn't have subfolders with unneeded files.
- Create expressions that filter out as few files as possible.
- Pattern mapping characters:
- Enter a descriptive field name for the names of the files.
- If Input file's name is needed in the Input tuple, name the Input field. (optional) (E.g. fileName, input-file)
- If the relative path (with name) needs to be with the current Input file in the Input tuple, name the Input field (optional) (e.g. path, filePath)
- Select if data should be pulled from the last check point.
- No - pulls all data.
Yes - pulls data modified after the previous run of the Flow.
Checkpointing accounts for file modification data and is stored on a per Flow basis.
Mark to limit the number of files that are input into the Flow.
- Yes - enter the file limit as an integer. Only the first N files are processed.
- No - as many input as available
- Select the parser type and fill the parser's configuration. (See Work with Parsers)
Click the Refresh icon to detect field mapping for the Input
- Field detection for some parsers reads from the input file as many bytes as needed.
- From the field mapping table:
- Confirm or change column names.
- Confirm or change column data types.
- Remove or confirm the arrangement of columns.
- Click OK, to save the Input