Work with Invoice Extraction
The Invoice Document Extractor lets users extract pre-selected fields from semi structured invoice documents. (Image and PDF files)
Other than saving and previewing the Extraction, no additional configuration is needed. It is already ready to use.
The Invoice Extraction model is a fast and easy way for you to quickly grab data from images and PDFs to use in your business automations.
Open Documents
Open AI from the navigation menu and select Documents.
Document Overview Page
View the saved document extractions.
This includes Invoices, Fixed Forms, and Semi-structured models.
The documents overview page displays all of the saved document extractions.
You can view the:
- Document extraction names - Name given to identify the document extraction.
- Number of versions - The number of different versions for the document extraction.
- Created - The date the document extraction was created.
- Settings:
- Edit - Open the extraction in Document Studio to edit the extraction.
- Clone - Copy the document extraction.
- Delete - Delete the document extraction.
- Configuration - Edit the name of the extraction.
Create a Pre-trained Invoice Extraction
To create a new Invoice Extraction:
- After opening Documents in Hero Platform_, click Create Document Model.
- Enter a name for the Invoice Extraction and click Next.
- Select Invoices and click Next.
Select one or more sample invoice documents from the file browser by clicking Choose file...
Click Save & Preview.A sample invoice document is a reference image. It allows you to view the type of data that will be extracted for future input invoice documents in the Flow Studio.
Supported file types:- .jpg
- .png
- The sample invoice image is displayed in the Document Studio highlighting the data found from the default fields.
Hero Platform_ scans the documents and matches the values with pre-built fields.
Fields are displayed under the Available Fields tab.- Name - The name of the field. See a list of all pre-built fields.
- Value - The value that Hero Platform_ found.
- Confidence - The confidence score 0-100 on how accurate Hero Platform_ believes the value to be.
- Type - All field types are Standard. To use custom fields, see Semi-Structure Document Extraction.
Show:
- All fields - All default fields are displayed with or without a value.
- Only extracted fields - Displays fields only where a value was found.
Fields can be filtered from the header bar or search for specific fields using the search bar.
There is nothing to configure at this point. Review the highlighted values to see if they match the fields displayed under Results Preview.
Adjust the UI size of the preview document or tab information by clicking and dragging the center line.
Documents with multiple pages can be viewed by scrolling. View additional invoice documents added to the extraction by clicking the arrows on either side of Document Studio.
View invoice documents by selecting the Sample Documents tab.
Add additional invoices (only PDF files are supported) by clicking Choose file...
- These files are for preview only and will not affect any Flows using this extractor.
Remove invoices by clicking the garbage icon next to the invoice's name.
Click Settings to view, add, or remove language dictionaries used to read the document.
Add one or more languages by selecting the language from the drop-down menu.Languages supported
Language Note English Supported French, German, Greek, Italian, Portuguese, Spanish Supported (may take slightly longer to process) If multiple languages are expected, select the dominant language before selecting additional languages. The model will give preference to language detection based on the list order.
- Click Save in the toolbar to add the Invoice Extraction.
Hero Platform_'s Invoice Field List
This is the list of Hero Platform_'s recognized fields in the pre-trained invoice extraction model.
Global values per-document
- ABA Routing Number
- Account Name
- Account Number
- Amount Due
- Amount Paid
- BIC
- Bill To
- Company ID
- Contact Person
- Customer Address
- Customer Company ID
- Customer ID
- Customer Name
- Customer Phone Number
- Customer Tax ID
- Due Date
- IBAN
- Invoice Date
- Invoice Number
- Order Date
- Order Number
- Payment Reference
- Payment Terms
- SWIFT
- Ship To Address
- Ship To Name
- Sort Code
- Tax ID
- Tax Rates
- Total Amount
- Total Amount Before Tax
- Total Tax
- Vendor Address
- Vendor Name
Line item values
Everything in a line item that does not match one of theses values are combined into a single "Other" value for the line.
- Item Quantity
- Item Unit Price
- Item Cell
- Item Date
- Item Number
- Item Total Price
- Item Description
- Item Unit of Measure
Use an Invoice Extraction in a Flow
After an Invoice Extraction has been saved, it can be used as a function in a Flow.
To use an Invoice Extraction in a Flow:
- Open and start creating a Flow in the Flow Studio.
- View the Document functions in the element browser.
- Click and drag the Invoice Extraction from the element browser onto the Flow Studio canvas.
- Connect the Invoice Extraction using a cable from an element in the Flow.
- Add Input documents.
Select the coordinates type:
Relative coordinates is the recommended option. Relative coordinates are more stable and can adjust for document scaling while absolute coordinates may require adjustments for document scaling changes. Support for absolute coordinates will be removed in a future release.
- Absolute Coordinates - returns (output field) the position of a value box on a document by pixel location on a document.
- Metadata (Tuple) containing x, y, w, h (Long) values
Relative Coordinates - returns (output field) the position of a value box on a document by percentage space on that document.
- Page_bounding_box (Tuple) containing boundingBox (Tuple) containing left, top, width, height (Double) values.
- Absolute Coordinates - returns (output field) the position of a value box on a document by pixel location on a document.
Configure/review the fields for the Fixed Form model's containerized function deployment.
- Capture logs - Select if the containerized function should capture logs.
- RAM - Adjust the sliding bar for memory (RAM) allocation for the function.
- vCPU - Adjust the sliding bar for CPU consumption. (by cores)
- Attempt timeout(s) - Enter the timeout setting (in seconds).
- Initial Delay - Enter the initial delay value in seconds for amount of time to between when container starts and when the Flow begins to use it.
- Retry attempts - Enter the max retry attempts before failing.
Automation Hero recommends leaving the containerized function settings at the default levels unless problems arise.
An example of when raising the default settings may be beneficial is when the the documents being processed are very large.
- Click OK to finish adding the Invoice Extraction to the Flow.