Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard.
Delphine BernhardAnne-Laure LigozatFanny MartinMyriam BrasPierre MagistryMarianne Vergez-CouretLucie SteibléPascale ErhartNabil HathoutDominique HuckChristophe ReyPhilippe ReynesSophie RossetJean SibilleThomas LavergnePublished in: LREC (2018)
Keyphrases
- topic models
- part of speech
- pos taggers
- text documents
- pos tagging
- natural language processing
- training corpus
- n gram
- unsupervised grammar induction
- tf idf
- noun phrases
- syntactic categories
- word sense disambiguation
- language independent
- lexical information
- multiword
- domain adaptation
- named entity recognition
- metadata
- language model
- statistical machine translation
- bag of words
- expert systems