Language Support for Document Extractors


Languages supported by Hero Platform_ AI features.



Fixed Form

Typed Text

  • Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, German, Greek, English, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian

Handwriting

  • English (and English character-set languages)

Invoice Extraction

  • English
  • French, German, Greek, Italian, Portuguese, and Spanish (may take slightly longer to process)

Semi-Structured Extraction

  • Typed text with minimal training:
    • English, German, French, Portuguese, Italian
  • Typed text with additional training: 
    • Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, Greek, Hindi, Croatian, Hungarian, Indonesian, Japanese, Korean, Norwegian, Polish, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian
      • Documents in other languages can be trained. The model’s accuracy depends on the number of labeled documents.

        • Automation Hero recommends labeling 1-2k documents or use the Context aware training feature to improve results with a smaller labeled document set. (20-50)

  • Handwritten text: English

Text Classification

Small

  • English

Medium

  • Arabic
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Dutch English French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Spanish
  • Thai
  • Turkish
  • Russian

Large 

  • Arabic
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Dutch English French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Spanish
  • Thai
  • Turkish
  • Russian

Text Extractor

Small and Large English Engines

  • English

Small and Large German Engines

  • German

Multilingual Engine

 Click here to expand...
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Assamese
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bengali Romanized
  • Bosnian
  • Breton
  • Bulgarian
  • Burmese
  • Burmese (Zawgyi)
  • Cantonese (Traditional)
  • Catalan
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Esperanto
  • Estonian
  • Finnish
  • French
  • Frisian
  • Fulah
  • Galician
  • Ganda
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian
  • Hausa
  • Hebrew
  • Hindi
  • Hindi Romanized
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Limburgish
  • Lingala
  • Lithuanian
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Marathi
  • Mongolian
  • Nepali
  • Northern Sotho
  • Norwegian
  • Oromo
  • Oriya
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Quechua
  • Romanian
  • Romansh
  • Russian
  • Sardinian
  • Scottish Gaelic
  • Serbian
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swati
  • Swedish
  • Tagalog
  • Tamil
  • Tamil Romanized
  • Telugu
  • Telugu Romanized
  • Thai
  • Tswana
  • Turkish
  • Ukrainian
  • Urdu
  • Urdu Romanized
  • Uyghur
  • Uzbek
  • Vietnamese
  • Welsh
  • Wolof
  • Xhosa
  • Yiddish
  • Yoruba