Work with the Text Extractor

The Text Extractor has a pre-trained natural language understanding engine that can be used on any text data. The Text Extractor allows users to enter human readable questions and the model searches the provided text to display the answers as a result.

Create a Text Extractor AI Model

  1. Open AI from the navigation menu and select AI Models.

  2. Click Create Text Extractor.

  3. Enter a name for the Text Extraction model.

    Select a Text Extraction engine.

    • English (SMALL) - Recommend for a smaller amount of data and fast processing.
      • Requires 1 GB or more of available RAM for Hero Platform_. 
    • English (LARGE) - Recommend for a larger amount of data and is more accurate but slower processing time. 
      • Requires 4 GB or more of available RAM for Hero Platform_.
    • German (SMALL) - Recommend for a smaller amount of data and fast processing.
      • Requires 1 GB or more of available RAM for Hero Platform_. 
    • German (LARGE) - Recommend for a larger amount of data and is more accurate but slower processing time.
      • Requires 4 GB or more of available RAM for Hero Platform_.
    • Multilingual (LARGE) - Recommend for text in a language other than English or German. 
      • Requires 4 GB or more of available RAM for Hero Platform_.

    The Text Extraction engine's container images are not preinstalled with Hero Platform_ and must be downloaded before first time use.

    Engine names in gray begin downloading after selecting and clicking SAVE shown in step 4.

    Extraction engine status:

    • When the engine is downloading or loading, the extraction engine status is: WAITING.
    • When the engine is available to answer questions, the extraction engine status is: ACTIVE.

    The time needed to download an extraction engine depends on the engine size and the resources available in the Hero Platform_ environment.

    Once the extraction engine has been downloaded, the engine is available as soon as Hero Platform_ can load it for all future text extraction.

    Click Next.

  4. Select an Input for the source of the text data.
    Select the name of the field that contains the text data from which to extract answers.
    Click Save.

  5. The Text Extraction Studio is displayed.

    The page may need a few moments to load for the extraction engine to become active.

  6. Users can enter human readable questions after the preview text has been loaded.
    Click Add field in the toolbar.

    • Enter a name for the question.
    • Enter the question.
      • Answers (the output) from previous questions can be used in following questions in the form of a variable.
        • Variable syntax: ${Field Name}
      • Questions are supported with up to 64 characters.
    • Mark the checkbox for the model to always attempt an answer. If the box is unchecked, the model may not reply with an answer if none were found.

    Click OK.

  7. The Text Extractor searches the text for the answer in the preview data. The results are displayed in the Results Preview column when the Preview button has been turned on.

    Click Preview in the toolbar to toggle the Results Preview on/off.

    When Preview is selected, new fields added to the text extraction model are processed and the results are displayed on the right under the Results Preview heading. Depending on the available resources, this action can cause a delay.

    When Preview is not selected, new fields added to the text extraction model are processed but the results are not available until the Preview button has been enabled. Enabling the results for multiple questions at the same time may reduce overall lag time. 

    The bar to the right side of the result displays the accuracy prediction of the extracted answer.

  8. Use the left and right arrow buttons on the side of the text to navigate between preview text pages.
    All user created question fields are applied to each of the preview text pages.
  9. When satisfied with the questions and the extracted text from the preview data, click Save in the toolbar to complete the Text Extraction model.

  10. Click Close to exit the Text Extraction Studio.

Configure a Text Extractor Model

To configure a Text Extractor model:

  1. Open the AI Models overview page.
  2. Click the settings icon and select Configuration.

  3. Adjust the CPU cores to use for this Text Extraction model and enter a time (in seconds) for the timeout settings when the model searches for answers.

Error when running Flow with Text Extractor

If you experience an error when processing a Flow with large text inputs and/or multi-page documents in your Text Extractor model, and receive an error such as: An unexpected error occured while calling <x>. Giving up after 0/2 attempts because no Docker container can be provided.

Automation Hero recommends that you raise the timeout setting.

Use a Text Extraction Model in a Flow

After a Text Extraction model has been created, it can be used in a Flow.

To use a Text Extraction model in a Flow:

  1. Open and start creating a Flow in the Flow Studio.
  2. View the Text Extraction models in the element browser.
  3. Click and drag a Text Extraction model from the element browser onto the Flow Studio canvas.
  4. Connect the Text Extraction model using a cable from an element in the Flow.
  5. Select the version number for the Fixed Form model.
  6. Configure/review the fields for the Fixed Form model's container deployment.

    1. Capture logs - Select if the containerized function should capture logs.
    2. RAM - Adjust the sliding bar for memory (RAM) allocation for the function.
    3. vCPU - Adjust the sliding bar for CPU consumption. (by cores)
    4. Attempt timeout(s) - Enter the timeout setting (in seconds).
    5. Initial Delay - Enter the initial delay value in seconds for amount of time to between when container starts and when the Flow begins to use it.
    6. Retry attempts - Enter the max retry attempts before failing.

    Automation Hero recommends leaving the container settings at the default levels unless problems arise. 

    An example of when raising the default settings may be beneficial is when the the documents being processed are very large. 

  7. Configure the fields for the Text Extraction model.
    • The input field is the text content to analyze. 
    • The output fields are the questions created in the Text Extraction Studio.
  8. Click OK to finish adding the Text Extraction model to the Flow.

Language Support for Multilingual Engine

 Click here to expand...
BelarusianHindi RomanizedMarathiSwedish
BrentonHungarianBurmese (Zawgyi)Tamil Romanized
CatalanIndonesianDutchTelugu Romanized
WelshIcelandicNorthern SothoTagalog
SpanishKhmerPortugueseUrdu Romanized
FrisianLimburgishSardinianHans Chinese (Simplified)
IrishLingalaSindhiHant Chinese (Traditional)
Scottish GaelicLaoSlovakZulu