Document Image Dewarping with TopOCR

For the ultimate portability in acquiring digital images of documents, cameras are far superior to conventional flat-bed scanners. Unfortunately, the main drawback to using cameras has been the software. Most OCR packages were meant to be used solely with scanners and therefor they assumed that images were perfectly straight, completely flat and uniformly illuminated.

TopOCR initially addressed the problem of non-uniform lighting and limited document skew, but did not address any of the more significant distortion issues when capturing images from books. Document Image Dewarping is the next evolutionary step for processing document images with a camera.

Capturing an image from a book can cause three kinds of distortions; binding curvature along the spine of the book, and cylindrical curvature in the page area. Additional distortion can also be caused by not holding the camera parallel to the surface of the page. TopOCR's advanced "Document Image Dewarping" can usually remove these distortions with a simple mouse-click.

TopOCR Dewarping Limitations

Document Dewarping works by using a line tracking function that fits splines to each curved text line and then creates a dewarped image by straightening each spline. However, the current version of the line tracker can become confused if the image has "non-consistant" data, for example; images or large captions. For this reason, images that use the Dewarp filter must be text only, and not contain graphics or captions that have a different font size than the text. An example of acceptable and unacceptable images is presented below:

Good Image - All Text Bad Image - Graphics/Captions

How to Use the Dewarp Function

If your document image is subject to any of the limitations mentioned above, then use your left mouse button and select a "sub-region" that only contains acceptable data. Then select "copy" and then "paste" under the Image Window's Edit Menu. If your image is not subject to any of the limitations, then proceed to the next step.

Raw Image Selected Image
Click Here for Larger Image Click Here for Larger Image

Select "Dewarp" under the Image Menu, and wait for the Progress Bar to indicate completion.

Dewarped Output Image
Click Here for Larger Image

From there you can go on to do further image processing or you can decide to OCR the image.