TopOCR's Binarization for Low Contrast Images
Some text images can have low contrast caused by having uneven light or too little light.
Images taken from glossy pages tend to reflect a lot of light that makes the image too bright in some regions and this can produce thin "washed-out" text.
Images that are too dark, or that have strong shadows, don't have enough contrast between the text and the background and will produce "fuzzy" distorted characters.
These low contrast images, whether they are too bright or too dark, make it more difficult to effectively separate the text from the page background and can reduce OCR accuracy.
TopOCR provides a solution to this problem in the form of the Image->Binarization... dialog settings. If the image suffers from uneven illumination, you can select the Equalize Contrast button. The results of Binarizing with Equalize Contrast are seen with the image on the right.
|Low Contrast Default Binarization||Equalize Contrast Binarization|
If the image is too dark or has strong shadows, you can select the Maximize Contrast button.
The results of Binarizing with Maximize Contrast are seen with the image on the right.
|Low Contrast Default Binarization||Maximize Contrast Binarization|
Low Quality Images
With a little user experimentation and adjustment, TopOCR's Image Processing system can produce useful text images from poor quality images.
Here's an example low resolution, low contrast, out of focus, and skewed image. Can your existing OCR system process this image?
|Raw Image||Processed by TopOCR|
To optimize binarization for low quality images, simply set the Image->Binarize->Background Slider to a value corresponding to the amount of noise or extraneous objects in the image.
The Image->Binarize->Background Slider is like an intensity dial for "Background Removal", low contrast images need a lower setting in order not to start clipping characters. Higher contrast images on the other hand can use a higher value and essentially get rid of all background including fingers, paper clips, etc.
Note: After you determine the correct background slider value, you may have the OCR engine automatically perform this operation by setting the background slider in the Settings->OCR... dialog.