American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers.
Melissa DellJacob CarlsonTom BryanEmily SilcockAbhishek AroraZejiang ShenLuca D'Amico-WongQuan LePablo QuerubinLeander HeldringPublished in: CoRR (2023)
Keyphrases
- news articles
- real world
- news stories
- text documents
- united states
- narrative structure
- unstructured text
- database
- text retrieval
- text mining
- keywords
- small scale
- free text
- real life
- structured data
- web pages
- information retrieval
- benchmark datasets
- web documents
- text data
- historical manuscripts
- semantic information
- natural language
- image classification
- historical data
- information extraction
- synthetic datasets
- document analysis
- semantic markup