IRMA: the 335-million-word Italian coRpus for studying MisinformAtion.
Fabio CarrellaAlessandro MianiStephan LewandowskyPublished in: EACL (2023)
Keyphrases
- news corpus
- word frequencies
- text corpus
- english words
- word pairs
- linguistic information
- sentence level
- training corpus
- multiword
- word sense
- unknown words
- statistical machine translation
- legal texts
- lexical features
- natural language text
- co occurrence
- noun phrases
- spontaneous speech
- manually annotated
- text corpora
- automatic summarization
- stop words
- word co occurrence
- news articles
- n gram
- parallel corpus
- sentence pairs
- conversational speech
- word frequency
- machine translation system
- named entities
- writing style
- word sense disambiguation
- newspaper articles
- recognizing textual entailment
- related words
- text mining
- word recognition