Big Scholarly Data in CiteSeerX: Information Extraction from the Web.
Alexander G. Ororbia IIJian WuMadian KhabsaKyle WilliamsClyde Lee GilesPublished in: WWW (Companion Volume) (2015)
Keyphrases
- information extraction
- database
- web data
- data sets
- textual data
- data analysis
- big data
- high quality
- training data
- data sources
- input data
- data structure
- data processing
- data collection
- data mining techniques
- raw data
- semi structured
- web mining
- web documents
- data extraction
- image data
- knowledge discovery
- high dimensional
- information retrieval
- semantic web
- natural language processing
- data points
- end users
- xml documents
- web logs
- essential information
- constantly growing