Automatic identification of document sections for designing a French clinical corpus (Identification automatique de zones dans des documents pour la constitution d'un corpus médical en français) [in French].
Louise DelégerAurélie NévéolPublished in: TALN (2) (2014)
Keyphrases
- automatic identification
- document corpus
- similar documents
- document level
- text corpus
- document clustering
- information retrieval
- text collections
- document collections
- text documents
- web documents
- retrieval systems
- document space
- relevant documents
- document retrieval
- training corpus
- document content
- word frequency
- document classification
- information retrieval systems
- keywords
- multiword
- text corpora
- training documents
- document representation
- scientific papers
- language model
- vector space model
- wikipedia articles
- sentence level
- digital documents
- word co occurrence
- automatic summarization
- electronic documents
- related documents
- document analysis
- multimedia documents
- retrieved documents
- document images
- query expansion
- text classification
- text mining
- natural language processing