Identifying Creative Content at the Page Level in the HathiTrust Digital Library Using Machine Learning Methods on Text and Image Features.
Nikolaus Nova ParulianGlen WortheyPublished in: iConference (1) (2021)
Keyphrases
- digital libraries
- image features
- content features
- text content
- website
- keywords
- page layout
- textual content
- digital documents
- html pages
- web pages
- semantic information
- metadata
- digital content
- cross media
- content and structure
- text information
- semantic content
- information retrieval
- web documents
- object recognition
- digital collections
- image representation
- web images
- multimedia documents
- pdf files
- computer vision
- electronic documents
- page content
- document type
- wikipedia pages
- user generated content
- information access
- cultural heritage
- text documents
- visual features
- information retrieval systems
- text mining
- xml documents
- search engine