Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts.
Kairit SirtsKairit PeekmanPublished in: Baltic HLT (2020)
Keyphrases
- website
- web applications
- image segmentation
- web documents
- training corpus
- text segmentation
- n gram
- word segmentation
- sentence level
- segmentation method
- text corpus
- word level
- medical images
- co occurrence
- keywords
- information retrieval
- syntactic analysis
- natural language text
- segmentation algorithm
- text classification
- natural language
- search engine