High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates.
Janick WeberpalsPamela A. ShawKueiyu Joshua LinRichard WyssJoseph M. PlasekLi ZhouKerry NganThomas DeRamusSudha R. RamanBradley G. HammillHana LeeSengwee TohJohn G. ConnollyKimberly J. DandreoFang TianWei LiuJie LiJosé J. Hernández-MuñozSebastian SchneeweissRishi J. DesaiPublished in: CoRR (2024)
Keyphrases
- partially observed
- natural language processing
- high dimensional
- multiple imputation
- missing data
- information extraction
- machine learning
- low dimensional
- text mining
- dimensionality reduction
- artificial intelligence
- logistic regression
- statistical databases
- natural language
- data points
- computational linguistics
- named entity recognition
- missing values
- knowledge representation
- neural network
- regression model
- similarity search
- bayesian networks
- variable selection
- feature space
- information retrieval