N.Rajeswari, S.Rathnapriya, S.Nijandan
The mixed raster content (MRC) standard specifies a framework for document compression which can dramatically improve the compression/ quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC compression is the separation of the document into foreground and background layers, represented as a binary mask. Therefore, the resulting quality and compression ratio of a MRC document encoder is highly dependent upon the segmentation algorithm used to compute the binary mask. The incorporated multi scale framework is used in order to improve the segmentation accuracy of text with varying size. In this paper, we propose a novel multi scale segmentation scheme for MRC document encoding based on the sequential application of two algorithms. The first algorithm, cost optimized segmentation (COS), is a block wise segmentation algorithm formulated in a global cost optimization framework. The second algorithm, connected component classification (CCC), refines the initial segmentation by classifying feature vectors of connected components using a Markov random field (MRF) model. The combined COS/CCC segmentation algorithms are then incorporated into a multi scale framework in order to improve the segmentation accuracy of text with varying size. Index Terms—MRC, COS, CCC, MRF.