Unsupervised Code-Switching for Multilingual Historical Document Transcription.
Dan GarretteHannah Alpert-AbramsTaylor Berg-KirkpatrickDan KleinPublished in: HLT-NAACL (2015)
Keyphrases
- multilingual information retrieval
- historical documents
- document retrieval
- multilingual documents
- unsupervised learning
- historical data
- document images
- handwriting recognition
- language independent
- information retrieval systems
- document collections
- supervised learning
- digital libraries
- database
- retrieval systems
- source code
- short list
- document clustering
- automatic transcription
- vector space model
- document classification
- web documents
- keywords
- multimedia
- information retrieval
- relevant documents
- document analysis
- semantic information
- unsupervised manner
- cf loadingtexthtml
- topic discovery
- metadata