Building a corpus of Italian Web forums: standard encoding issues and linguistic features.
Silvia PetriMirko TavosanisPublished in: J. Lang. Technol. Comput. Linguistics (2009)
Keyphrases
- linguistic features
- web forums
- hand crafted
- linguistic information
- structural features
- semantic features
- linguistic knowledge
- named entities
- feature set
- text classification
- sentence level
- news stories
- part of speech
- named entity recognition
- web mining
- translation model
- language model
- knowledge base
- databases
- statistical model
- information extraction
- high level