Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data.
Sara MoraDaniele Roberto GiacobbeMatteo BassettiMauro GiacominiPublished in: EFMI-STC (2023)
Keyphrases
- preprocessing
- data collection
- data structure
- data sets
- data processing
- natural language
- data analysis
- special case
- raw data
- input data
- data sources
- database
- computer science
- information extraction
- high quality
- training data
- learning algorithm
- synthetic data
- question answering
- data distribution
- statistical methods
- databases
- data quality