Détection d'erreurs dans des transcriptions OCR de documents historiques par réseaux de neurones récurrents multi-niveau (Combining character level and word level RNNs for post-OCR error detection).
Thibault MagallonFrédéric BéchetBenoît FavrePublished in: CORIA-TALN-RJC (TALN 2) (2018)
Keyphrases
- word level
- printed documents
- document images
- document analysis
- optical character recognition
- error detection
- character recognition
- word spotting
- error correction
- language independent
- text lines
- recurrent neural networks
- machine vision
- information retrieval
- fault tolerance
- handwritten documents
- image analysis
- document level
- natural language processing
- relevant documents
- n gram
- document collections
- sentence level
- query expansion
- handwriting recognition
- viterbi algorithm
- hidden markov models