Restructuration automatique de documents dans les corpus semi-structurés hétérogènes.
Guillaume WisniewskiLudovic DenoyerPatrick GallinariPublished in: EGC (2005)
Keyphrases
- person names
- named entities
- newspaper articles
- word frequencies
- document collections
- text corpora
- multiword
- information retrieval systems
- text collections
- similar documents
- text documents
- text corpus
- information retrieval
- word frequency
- document clustering
- parallel corpora
- document classification
- text data
- document level
- xml documents
- keywords
- plain text
- topic segmentation
- training corpus
- natural language text
- document retrieval
- relevant documents
- web documents
- sentence level
- parallel corpus
- training documents
- manually annotated
- document analysis
- noun phrases
- co occurrence
- free text
- text categorization
- text classification
- natural language processing