Extracts embedded text from a .pdf document. A new column is created for the results.

The PDF text can be parsed into a single string or into a list of strings by page.

Requires the user to specify the PDF as binary data.

  • Select an argument. (Binary) 
  • Parse by pages?
    • If unchecked, the embedded text of all pages in a .pdf document is parsed into a record as a single string.
    • If checked, the embedded text of all pages in a .pdf document are parsed into a single record as a list. Each element in the list is a string of the embedded text per page.
      • The text per page can be separated using a function like Flatten List.
  • Enter an output field name.
  • Click OK.