Duplicate Detection with Efficient Language Models for Automatic Bibliographic Heterogeneous Data Integration.
Nicolas TurennePublished in: CoRR (2015)
Keyphrases
- data integration
- duplicate detection
- language model
- data cleaning
- heterogeneous data
- data model
- data management
- heterogeneous data sources
- language modeling
- data sources
- data warehouse
- databases
- probabilistic model
- text classification
- record linkage
- linked data
- retrieval model
- decision support system
- information retrieval
- database systems
- data warehousing
- business intelligence
- database management systems
- query expansion
- metadata
- missing values
- relational databases
- database
- website
- data extraction
- relevance model
- natural language processing
- data sets