Language Support for Document Extractors


Languages supported by Hero Platform_ AI features.



Fixed Form

Typed Text

  • Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, German, Greek, English, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian

Handwriting

  • English (and English character-set languages)

Invoice Extraction

  • English
  • German, French, Spanish, Portuguese, Italian (may take slightly longer to process)

Custom Invoice Extraction

  • Typed text with minimal training:
    • English, German, French, Portuguese, Italian
  • Typed text with additional training: 
    • Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, Greek, Hindi, Croatian, Hungarian, Indonesian, Japanese, Korean, Norwegian, Polish, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian
      • Documents in other languages can be trained. The model’s accuracy depends on the number of labeled documents.

        • Automation Hero recommends labeling 1-2k documents or use the Context aware training feature to improve results with a smaller labeled document set. (20-50)

  • Handwritten text: English

Text Classification

Small

  • English

Medium

  • Arabic
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Dutch English French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Spanish
  • Thai
  • Turkish
  • Russian

Large 

  • Arabic
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Dutch English French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Spanish
  • Thai
  • Turkish
  • Russian

Text Extractor

Small and Large English Engines

  • English

Small and Large German Engines

  • German

Multilingual Engine

 Click here to expand...
  • Afrikaans
  • Amharic
  • Arabic
  • Assamese
  • Azerbaijani
  • Belarusian
  • Bulgarian
  • Bengali
  • Bengali Romanized
  • Breton
  • Bosnian
  • Catalan
  • Czech
  • Welsh
  • Danish
  • German
  • Greek
  • English
  • Esperanto
  • Spanish
  • Estonian
  • Basque
  • Persian
  • Fulah
  • Finnish
  • French
  • Frisian
  • Irish
  • Scottish Gaelic
  • Galician
  • Guarani
  • Gujarati
  • Hausa
  • Hebrew
  • Hindi
  • Hindi Romanized
  • Croatian
  • Haitian
  • Hungarian
  • Armenian
  • Indonesian
  • Igbo
  • Icelandic
  • Italian
  • Japanese
  • Javanese
  • Georgian
  • Kazakh
  • Khmer
  • Kannada
  • Korean
  • Kurdish
  • Kyrgyz
  • Latin
  • Ganda
  • Limburgish
  • Lingala
  • Lao
  • Lithuanian
  • Latvian
  • Malagasy
  • Macedonian
  • Malayalam
  • Mongolian
  • Marathi
  • Malay
  • Burmese
  • Burmese (Zawgyi)
  • Nepali
  • Dutch
  • Norwegian
  • Northern Sotho
  • Oromo
  • Oriya
  • Punjabi
  • Polish
  • Pashto
  • Portuguese
  • Quechua
  • Romansh
  • Romanian
  • Russian
  • Sanskrit
  • Sinhala
  • Sardinian
  • Sindhi
  • Slovak
  • Slovenian
  • Somali
  • Albanian
  • Serbian
  • Swati
  • Sundanese
  • Swedish
  • Swahili
  • Tamil
  • Tamil Romanized
  • Telugu
  • Telugu Romanized
  • Thai
  • Tagalog
  • Tswana
  • Turkish
  • Uyghur
  • Ukrainian
  • Urdu
  • Urdu Romanized
  • Uzbek
  • Vietnamese
  • Wolof
  • Xhosa
  • Yiddish
  • Yoruba
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Zulu