Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive.
Jan NouzaPetr CervaJindrich ZdánskýKarel BlavkaMarek BohacJan SilovskýJosef ChaloupkaMichaela KucharováLadislav SepsJirí MálekMichal RottPublished in: INTERSPEECH (2014)
Keyphrases
- cross language
- document collections
- metadata
- information retrieval
- historical documents
- parallel corpora
- multiword
- information retrieval systems
- document retrieval
- relevant documents
- wireless communication
- machine translation
- language independent
- xml documents
- keywords
- cross lingual
- retrieval systems
- document classification
- text categorization
- text classification
- word spotting
- historical manuscripts