AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification.
Abdelrahman AbdallahMahmoud AbdallaMohamed ElkasabyYasser ElbendaryAdam JatowtPublished in: CoRR (2023)
Keyphrases
- cross lingual
- machine translation
- text classification
- information extraction
- cross lingual information retrieval
- language independent
- cross language
- event extraction
- language modeling
- multi lingual
- language specific
- text mining
- machine learning
- parallel corpus
- web news
- natural language processing
- information retrieval
- decision trees
- data mining
- document clustering
- indian languages
- artificial intelligence
- translation model
- statistical machine translation
- parallel corpora
- machine translation system
- chinese english
- feature selection
- natural language
- n gram
- target language
- training set
- knowledge discovery
- text categorization