Word vectors, reuse, and replicability: Towards a community repository of large-text resources.
Murhaf FaresAndrey KutuzovStephan OepenErik VelldalPublished in: NODALIDA (2017)
Keyphrases
- learning object repositories
- learning objects
- keywords
- linguistic information
- text segments
- sentence level
- natural language text
- english text
- text corpus
- lexical features
- text input
- word pairs
- syntactic categories
- string matching
- electronic documents
- related words
- syntactic analysis
- text retrieval
- word counts
- learning resources
- sentence similarity
- digital libraries
- word level
- printed text
- english words
- training corpus
- digital collections
- text mining
- multiword
- educational resources
- chinese text
- data repositories
- metadata
- information retrieval
- printed documents
- noun phrases
- syntactic information
- lexical information
- concept space
- open educational resources
- search engine
- software reuse
- word sense disambiguation
- text documents
- feature vectors
- digital resources
- stop words
- mailing lists
- machine translation system
- text corpora
- document analysis
- compressed text
- vector space
- document images
- co occurrence
- named entity recognizer
- punctuation marks