A Bayesian Network Approach to Semantic Labelling of Text Formatting in XML Corpora of Documents.
Florendia Fourli-KartsouniKonstantinos SlavakisGeorgios KouroupetroglouSergios TheodoridisPublished in: HCI (7) (2007)
Keyphrases
- xml documents
- linguistic analysis
- semantic information
- bayesian networks
- document content
- document structure
- topic segmentation
- word frequency
- document centric
- text data
- page layout
- arabic text
- text corpora
- semantic content
- markup language
- text documents
- natural language processing
- text collections
- text corpus
- content and structure
- xml format
- metadata
- text representation
- document type
- semantically related
- sentence similarity
- natural language text
- free text
- document corpus
- information retrieval
- text generation
- electronic documents
- web documents
- word pairs
- digital documents
- structured documents
- document representation
- document collections
- keywords
- text mining
- document analysis
- textual data
- text retrieval
- structured data
- linguistic patterns
- natural language
- xml data
- natural language generation
- xml retrieval
- relational databases
- concept space
- semantic similarity
- automatic summarization
- information extraction
- semantic features
- text categorization
- linguistic information
- semi structured
- extensible markup language
- information retrieval systems
- latent semantic analysis
- wordnet
- controlled vocabulary
- text classifiers
- semantic relationships
- document images
- xml schema
- document retrieval
- retrieval systems
- multimedia documents
- text classification
- document clustering