ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Integrated High-Performance and Web-Oriented System of The Kazakh Language Text Recognition

Proceeding: The Second International Conference on Informatics Engineering & Information Science (ICIEIS)

Publication Date:

Authors : ; ; ; ; ; ;

Page : 25-36

Keywords : Service-oriented Architecture (SOA); Web service; High-performance Computing; Kazakh Language; Natural Language Processing (NLP); Optical Character Recognition (OCR); Tesseract; Xerox Finite State Tool (XFST); Finite Statetransducer.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

We presented in the paper the integrated high-performance and Web-oriented computer recognition system for the Kazakh language text. The design and integration methodology of the system are based on service-oriented architecture that allows provide an easy, flexible, and extensible integration of any language service into any desktop or mobile client. We have designed and built 4-tier SOA on the basis of W3C Web service standard. Use the high-performance cluster shows significant advantages in the Kazakh text processing, especially for large arrays of texts. The main objective of the developing system is to provide any person easy access to text and documentation in Kazakh language with the following possibility of editing and manipulation different docs through respective Web services OCR and morphological analysis/correction. Developed web services cover the Kazakh language text OCR and morphological analysis for subsequent correction of errors after OCR, and show an acceptable quality of the Kazakh texts recognition which is better than existing tools of text recognition.

Last modified: 2013-11-14 22:52:17