PDF2Text

Description

Extracts embedded text from a .pdf document. A new column is created for the results.

The PDF text can be parsed into a single string or into a list of strings by page.

Requires the user to specify the PDF as binary data.

This function extracts embedded text from PDF files. It does not use optical character recognition (OCR) to locate text on PDFs.

Use

  • Select an argument. (Binary) 
  • Parse by pages?
    • If unchecked, the embedded text of all pages in a .pdf document is parsed into a record as a single string.
    • If checked, the embedded text of all pages in a .pdf document are parsed into a single record as a list. Each element in the list is a string of the embedded text per page.
      • The text per page can be separated using a function like Flatten List.
  • Enter an output field name.
  • Click OK.

Type

Formulas