How To Get The Best Results With TopOCR

In order to get the best results with TopOCR, you'll have to learn how to take the most effective photographs. Using a camera for OCR is in principle not much different from other aspects of photography, remember you have a camera and not a scanner! If you have at least a 3 megapixel camera with a good quality lens you should be able to capture a full page of text, larger resolutions will of course give better results.

If you follow the advice we provide below, you will be on your way to achieving the best possible results with your camera. To further help you, we have provided a calibration page in PDF format. Simply download and print out this pdf file, and spend a few minutes experimenting with your camera settings and TopOCR's Camera Filter settings to get the best results. This is why we provide a "Demo" version of our program, to let you experiment with your own camera, and to allow you to develop your skill without risking anything other than your time, (but this time will be well-spent!).

Getting Started

The first step is to modify the default settings of your camera.

  • If you're in an environment that has good lighting, set the ISO value
    to its lowest value, this reduces "noise" in the image. However, in
    low light situations you may need to raise the ISO setting.
  • Set the image file compression to the lowest possible compression
    to improve quality.
  • Generally, the AutoFocus will be sufficient.
  • Zoom - Generally it's best to experiment with the zoom and
    distance, you'll quickly find what works best. The best approach
    is to try to "frame" the text portion of the page as best as possible.
  • Disable the Flash.

Angle

When you take the picture the camera lens must be EXACTLY parallel to the page being photographed, otherwise edge distortions will be introduced which will effect accuracy.

Stability

Try not to shake the camera when acquiring the image, if you're having a problem with blur, try increasing the shutter speed, but remember this will take more lighting. Some cameras have an "anti-shaking" feature, try enabling this if your images look blurry.

Lighting

The most important issue is lighting. The light source should be at an angle to the page and NOT directly overhead. If the light source is directly overhead there will be "specular reflection" from the page, and this will seriously degrade the image. Make sure there are no shadows from nearby objects cast on the page, including from the camera or your own body! For example, if you're in a library it's best not to sit at a table that's directly under a light but one that's slightly further away.

Using Lamps

Another approach is to use one or more lamps to enhance the light source, this is not always possible in situations when you're traveling, but is pretty simple to do at home. You can use a single lamp, but two work better by reducing the shadows. Simply place the lamp slightly to the left or right of the document, position the lamp so that the light tends to spread out more and is not too focused down to the table while minimizing cast shadows.

If you're REALLY ambitious, you can buy a tripod, and make a copy stand which includes lighting, such a setup is not expensive, but it's recommended trying the manual mode first so you'll be familiar with the height at which you'll be capturing from.

Aspect Ratio

It's important to consider what is called the "aspect ratio" of the camera, this basically refers to the shape of the image the camera captures. Most digital cameras capture in a 4:3 aspect ratio. However most documents don't share this aspect ratio, indeed even somewhat the opposite. If you shoot your document in "portrait" mode, you are going to be wasting space and the resolution of the camera, see below:

Portrait Mode Image Landscape Mode Image


Here's the actual untouched PDF output for each image, as you can see capturing in landscape mode is far superior to portrait mode.

Portrait Mode PDF Landscape Mode PDF
Portrait PDF Output file Landscape PDF output file

Setting Camera Filter Parameters

In order to get the best possible recognition results for your particular camera you will need to adjust the Configure Camera Filter parameters. This is a straight-forward process of "trial-and-error", of trying new settings, clicking on the OCR button, and observing the results. We suggest you perform these steps in the following order:

  • Turn-off resolution enhancement
  • Increase/decrease Adaptive Thresholding parameters
  • If there is some small amount of blur in the image, turn on the
    sharpness filter, but use it's lowest settings first. If you decide to
    sharpen the image you should also turn on the DeSpeckle filter
    as well.

Important Note:

TopOCR has a built in feature that automatically rotates landscape mode images in the orientation shown above to the correct orientation for performing OCR. This feature is necessary because all images processed by the OCR engine must be in the portrait mode. However, this feature, is turned off by default. It us up to the user to enable it by selecting from the Settings menu the Landscape->Portrait menu item if the image was captured in landscape mode. Note that the left side of the page should be at the bottom, see the images, above.