Create a File System Input

File system Connection types:

  • DropBox
  • FTP
  • Local File System
  • Microsoft OneDrive
  • S3
  • SSH 

Create an Input for File System Data in Hero Platform_

  1. Open Hero Platform_.
  2. Open Integration from the navigation menu and select Inputs.
  3. Click Create New Input.

  4. Enter a name for the File System Input and select a File System Connection from the drop-down list.
  5. Configure the File System Input data.
    • Enter the file path to the data.

      • Pattern mapping characters:
        • "?" matches one character.
        • "*" matches zero or more characters.
        • "**" matches zero or more directories in a path.

      Examples:

      • folder/t?st.txt - matches folder/test.txt but also folder/tast.txt or folder/txst.txt
      • folder/*.txt - matches all .txt files in the folder directory.
      • folder/**/test.txt - matches all test.txt files underneath the folder path.
      • folder/subfolder/subfolder2/**/*.txt - matches all .txt files underneath the folder/subfolder/subfolder2 path.
      • folder/**/subfolder/bla.txt - matches folder/f2/f3/subfolder/bla.txt but also org/f2/f3/f4/subfolder/bla.txt and folder/subfolder/bla.txt

      When using wildcard characters with SSH: SCP and SFTP both require an absolute path. E.g. /home/ubuntu/docs/**/*.pdf

      Be aware of AWS S3 costs when using wildcards patterns.

      Example using the file path: folder/f2/**/subfolder/a.txt 

      • Hero Platform_ will list the objects in folder/f2.
        • The folder structure inside folder/f2 doesn't really count against S3 costs. Hero Platform_ has a single query and as many page requests as needed.
      • Hero Platform_ will load the resulting object names (+meta data) in batches. One batch will have multiple object names and metadata.
      • Hero Platform_ will filter the results in the memory for: **/subfolder/a.txt
      • Hero Platform_ downloads only the matching objects at a later time during the Flow execution.

      The AWS cost waste comes from the number of the filtered out files.

      • If 0% of the files are filtered out, then no waste.

      • If 50% of the files are filtered out on Hero Platform_'s side, then in theory with server side filtering, half of the page request could have been spared.

      Recommendations for using wild card patters with S3:

      • Have a well organized folder structure.

      • Try and use wildcards close to the end of the expression when possible.
      • If using the expression folder/*.txt, ensure folder doesn't have subfolders with unneeded files.

      • Create expressions that filter out as few files as possible.
    • Enter a descriptive field name for the names of the files.
      • If Input file's name is needed in the Input tuple, name the Input field. (optional) (E.g. fileName, input-file)
      • If the relative path (with name) needs to be with the current Input file in the Input tuple, name the Input field (optional) (e.g. path, filePath)

  6. Select if data should be pulled from the last checkpoint.
    • No - pulls all data.
    • Yes - pulls data modified after the previous run of the Flow.

    • Checkpointing accounts for file modification data and is stored on a per Flow basis.
    • Exporting a Flow includes checkpoint information.
    • Importing a Flow with checkpoint information allows the user to select if the import should include checkpoint information. 
  7. Mark to limit the number of files that are input into the Flow.

    • Yes - enter the file limit as an integer. Only the first N files are processed.
    • No - as many input as available


  8. Select the parser type and fill the parser's configuration. (See Work with Parsers)
  9. Click the Refresh icon to detect field mapping for the Input

    • Field detection for some parsers reads from the input file as many bytes as needed.
  10.  From the field mapping table:
    • Confirm or change column names.
    •  Confirm or change column data types.
    • Remove or confirm the arrangement of columns.
  11. Click OK, to save the Input