RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams.
Gabriel Iturra-BocazFelipe Bravo-MarquezPublished in: SIGIR (2023)
Keyphrases
- data streams
- training corpus
- sliding window
- keywords
- sentence level
- string matching
- english text
- open source
- n gram
- natural language text
- multiword
- text corpus
- related words
- co occurrence
- english words
- lexical features
- text segments
- text input
- punctuation marks
- information retrieval
- sentence similarity
- sensor networks
- syntactic analysis
- text retrieval
- printed documents
- syntactic information
- data sets
- chinese text
- text mining
- noun phrases
- text classification
- word level
- linguistic information
- text to speech
- word pairs
- semi supervised
- training set
- streaming data
- stream data
- word counts
- word sense
- text documents
- handwritten words
- concept drift
- feature space