A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing.
Pranav ShettyArunkumar Chitteth RajanChristopher KünnethSonkakshi GuptaLakshmi Prerana PanchumartiLauren HolmChao ZhangRampi RamprasadPublished in: CoRR (2022)
Keyphrases
- data extraction
- natural language processing
- general purpose
- information extraction
- semi structured
- web data extraction
- computational linguistics
- data integration
- wordnet
- text mining
- natural language
- web pages
- machine learning
- named entity recognition
- knowledge representation
- information retrieval
- named entities
- word sense disambiguation
- machine translation
- database
- web mining
- text summarization
- html pages
- web documents
- structured data
- similarity measure
- real world
- databases