Leveraging Text Data for Word Segmentation for Underresourced Languages.
Thomas GlarnerBenedikt T. BoenninghoffOliver WalterReinhold Haeb-UmbachPublished in: INTERSPEECH (2017)
Keyphrases
- text data
- word segmentation
- text classification
- language independent
- cross lingual
- n gram
- text mining
- text documents
- bag of words
- machine learning
- text categorization
- high dimensional
- cross language
- document collections
- pos tagging
- feature selection
- high dimensional data
- structured data
- language modeling
- labeled data
- knn
- data sets
- machine translation
- test collection
- image processing
- language model
- multimedia
- neural network
- databases