Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data.
Dawid Adam PlaskowskiSzymon SkwarekDominika GrajewskaMaciej NiemirAgnieszka LawrynowiczPublished in: ICAART (3) (2024)
Keyphrases
- synthetic data
- semi structured
- language model
- information extraction
- data extraction
- web documents
- language modeling
- web data extraction
- web information extraction
- information retrieval
- structured data
- n gram
- web pages
- document retrieval
- probabilistic model
- language modelling
- retrieval model
- text mining
- statistical language models
- semi structured data
- search engine
- web search engines
- real world
- real image data
- data model
- free text
- wrapper induction
- web data
- data sets
- website
- query expansion
- test collection
- query terms
- sentiment analysis
- databases
- pseudo relevance feedback
- keywords
- machine learning
- document level
- relevance model
- natural language processing
- query processing
- statistical language modeling
- language models for information retrieval