Language Support for Document Extractors
Languages supported by Hero Platform_ AI features.
Fixed Form
Typed Text
- Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, German, Greek, English, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian
Handwriting
English (and English character-set languages)
Invoice Extraction
- English
- French, German, Greek, Italian, Portuguese, and Spanish (may take slightly longer to process)
Semi-Structured Extraction
- Typed text with minimal training:
- English, German, French, Portuguese, Italian
- Typed text with additional training:
- Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, Greek, Hindi, Croatian, Hungarian, Indonesian, Japanese, Korean, Norwegian, Polish, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian
Documents in other languages can be trained. The model’s accuracy depends on the number of labeled documents.
Automation Hero recommends labeling 1-2k documents or use the Context aware training feature to improve results with a smaller labeled document set. (20-50)
- Arabic, Bulgarian, Chinese (Simplified), Chinese (Traditional), Czech, Dutch/Flemish, Greek, Hindi, Croatian, Hungarian, Indonesian, Japanese, Korean, Norwegian, Polish, Russian, Slovak, Slovenian, Spanish/Castilian, Thai, Turkish, Ukrainian
Handwritten text: English
Text Classification
Small
- English
Medium
- Arabic
- Chinese (Simplified)
- Chinese (Traditional)
- Dutch English French
- German
- Italian
- Japanese
- Korean
- Polish
- Portuguese
- Spanish
- Thai
- Turkish
- Russian
Large
- Arabic
- Chinese (Simplified)
- Chinese (Traditional)
- Dutch English French
- German
- Italian
- Japanese
- Korean
- Polish
- Portuguese
- Spanish
- Thai
- Turkish
- Russian
Text Extractor
Small and Large English Engines
- English
Small and Large German Engines
- German