Classification automatique de documents structurés. Application au corpus d'arbres étiquetés de type XML.
Guillaume WisniewskiLudovic DenoyerPatrick GallinariPublished in: CORIA (2005)
Keyphrases
- xml documents
- document classification
- xml format
- metadata
- information retrieval
- classification accuracy
- pre classified
- text classification
- decision trees
- machine learning
- support vector machine
- document centric
- document collections
- text data
- automatic classification
- training documents
- newspaper articles
- feature selection
- data model
- class labels
- image classification
- feature vectors
- text documents
- retrieval systems
- markup language
- text corpora
- word pairs
- supervised machine learning
- feature extraction
- semi structured documents
- person names
- information extraction