SoMaJo: State-of-the-art tokenization for German web and social media texts.
Thomas ProislPeter UhrigPublished in: WAC@ACL (2016)
Keyphrases
- social media
- user generated content
- web applications
- social media data
- website
- reputation management
- web pages
- semantic web
- web documents
- web traffic
- web content
- real world events
- social networking sites
- web technologies
- web data
- linked data
- social media content
- social networks
- textual data
- biomedical text
- social networking
- web mining
- named entities
- search engine