Leveraging Text Data for Word Segmentation for Underresourced Languages.

Thomas Glarner Benedikt T. Boenninghoff Oliver Walter Reinhold Haeb-Umbach

Published in: INTERSPEECH (2017)

Keyphrases

text data
word segmentation
text classification
language independent
cross lingual
n gram
text mining
text documents
bag of words
machine learning
text categorization
high dimensional
cross language
document collections
pos tagging
feature selection
high dimensional data
structured data
language modeling
labeled data
knn
data sets
machine translation
test collection
image processing
language model
multimedia
neural network
databases