Abstract:
The segmentation and recognition of Arabic handwritten text has been an area of
great interest in the past few years. However, a small number of research papers and
reports have been published in this area. There are several major problems with Arabic
handwritten text processing: Arabic is written cursively and many external objects are
used such as dots, 'HanlZa', 'Madda', and diacritic objects. In addition, Arabic characters
have more than one shape according to their position inside a word. More than one
character can also share the same horizontal space, creating vertically overlapping
connected or disconnected blocks of characters. This makes the problem of segmentation
of Arabic text into characters, and their classification even more difficult. In this work a technique is presented that segments difficult handwritten Arabic
text. A conventional algorithm is used for the initial segmentation of the text into
connected blocks of characters. The algorithm then generates pre-segmentation points for
these blocks. A neural network is subsequently used to verify the accuracy of these
segmentation points. Another conventional algorithm uses the verified segmentation
points and segments the connected blocks of characters. These characters can then be used
as input to another neural network for classification. Two major problems were encountered in the above scenario. First, the
segmentation phase proved to be successful in vertical segmentation of connected blocks
of characters. However, it couldn't segment characters that were overlapping horizontally,
and this affects any neural network classifier.
Second, there are a lot of handwritten characters that can be segmented and
classified into two or more different classes depending on whether you look at them
separately, or in a word, or even in a sentence. In other words, character segmentation and
classification, especially handwritten Arabic characters, depends largely on contextual
information, and not only on topographic features extracted from these characters.