Corpus Assembly as Text Data Integration from Digital Libraries and the Web.
Udo HahnTinghui DuanPublished in: JCDL (2019)
Keyphrases
- data integration
- digital libraries
- digital documents
- data extraction
- linked data
- data sources
- web documents
- data exchange
- data model
- data management
- databases
- query answering
- data integration systems
- multiple data sources
- data transformation
- information retrieval
- data warehouse
- website
- web pages
- information resources
- biological databases
- web mining
- web sources
- heterogeneous data sources
- business intelligence
- data cleaning
- heterogeneous data
- web data
- metadata
- semantic web
- biological data
- schema mappings
- database
- end users
- query decomposition
- keywords
- machine learning
- anchor text
- link analysis
- global schema
- schema matching
- data mining
- molecular biology
- web databases
- information integration
- topic maps
- knowledge discovery