The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications.
Mirac SuzgunLuke Melas-KyriaziSuproteem K. SarkarScott Duke KominersStuart M. ShieberPublished in: CoRR (2022)
Keyphrases
- intellectual property
- information retrieval
- patent information
- patent documents
- million images
- patent retrieval
- small scale
- real world
- patent search
- prior art
- benchmark datasets
- clef ip
- database
- structured data
- text data
- real life
- feature space
- training data
- key phrase extraction
- test set
- text classification
- information retrieval systems
- citation networks