Towards generating web-accessible STEM documents from PDF.
Volker SorgeAkashdeep BansalNeha M. JadhavHimanshu GargAyushi VermaMeenakshi BalakrishnanPublished in: W4A (2020)
Keyphrases
- web documents
- web data
- website
- web information
- multilingual documents
- information retrieval systems
- textual data
- open directory project
- pdf files
- xml documents
- pdf documents
- document repositories
- document retrieval
- document classification
- probability density function
- information sources
- digital documents
- electronic documents
- web mining
- information retrieval
- web pages
- relevant documents
- web applications
- web users
- user interests
- web content
- multimedia documents
- content similarity
- textual features
- newspaper articles
- document collections
- web environment
- database
- google scholar
- digital libraries
- probabilistic model
- web crawler
- web search engines
- linked data
- topic specific
- structured information
- query terms
- document representation
- link analysis
- ranked list