Publication - Detection and segmentation: Processes of manuscript and multimedia archives for indexing and recognition

Detection and segmentation: Processes of manuscript and multimedia archives for indexing and recognition

Research Area:  
Phd Thesis


Year: 2010
Authors: Vassilis Papavassiliou
University: Εθνικό Μετσόβιο Πολυτεχνείο
" The thesis focuses on handwritten document image analysis, so as to study and propose methods for two critical preprocessing stages in the workflow of an optical character recognition application, such as text-line and word segmentation. The shortcomings of the existing methods are discussed and two novel techniques for text-lines segmentation and one for locating words are introduced. The first text-line segmentation algorithm is based on locating the optimal succession of text and gap areas within vertical zones by applying Viterbi algorithm on an HMM with parameters drawn from statistics of each type of area from the whole document image. Then, a text-line separator drawing technique is applied and finally the connected components are assigned to text lines according to simple geometrical constraints that conclude if a connected component can be directly assigned or it should be split because it lies across successive text lines. The algorithm participated in the ICDAR07 and ICDAR09 handwriting segmentation contests and took the first and second place respectively. The second method is based on binary morphology. The basic steps of the approach are: a) apply dilation and sub-sampling to produce a low resolution image, in which the underlying texture of text lines is apparent while preventing aliasing and b) apply dilations and (p,q)-th generalized foreground rank openings successively to join close and horizontally overlapping regions while preventing a merge in the vertical direction. These operations evolve the candidate text lines and distinguish special patterns, which imply that text lines have come very close or have been merged. Finally, each connected component of the initial document image is assigned to the text line that intersects, whereas if it intersects more than one text lines, we cut it using the local ridges produced with the application of the watershed algorithm. Word segmentation can be seen as a problem which requires the formulation of a metric of the gap between successive components and the clustering of the gaps in ""inter"" or ""intra"" word classes. To measure the gap metric, we use the negative logarithm of the objective function of a soft-margin linear SVM. We employ a nonparametric approach to estimate the probability density function of the gap metrics and have observed that the “inter” words gaps are accumulated to the most right lobe of the pdf while the “intra” word gaps are gathered to the left lobe. The classification threshold is chosen to be equal to the minimum between the two main lobes. The algorithm tested on the benchmarking datasets of ICDAR07 and ICDAR09 handwriting segmentation contests and outperformed the participating algorithms. Furthermore, the thesis studies the problem of locating artificial text in video frames. A new method for verifying text areas detected in video streams is proposed. The algorithm explores the spectral properties of the horizontal projection of candidate text regions in order to reduce the high amount of false alarms that most text detection algorithms suffer from. The algorithm has been tested on newscast video sequences and we conclude that the addition of the verification module increased the precision rate significantly while keeping the recall rate almost unaffected. "