This tool demonstrates the state of Arabic OCR technology developed at QCRI. Our system is based on the speech recognition toolkit Kaldi, the OCRopus project for page layout analysis, and our preprocessing and feature extraction tool PrepOCRessor. See the PrepOCRessor home page for related publications.

This system is designed for continuous text recognition and works best for entire pages of historic documents with challenging script.

If you are interested in a collaboration with large amounts of Arabic documents (for example Arabic heritage collections in libraries), please contact us to learn more about our large-scale system.

Document image in Arabic
(single page)

(maximum file size: 10M, formats: .jpg, .jpeg, .png)
Layout
Domain
Output formats
Arabic
Security check