Image Window - Settings

Auto-OCR On/Off


Enabling this menu item will automatically OCR an image after the image file has been opened or an image pasted to TopOCR.

Image Source


TopOCR uses different image processing algorithms depending on whether the image originated from a scanner or a camera. This setting allows you to specifiy either camera or scanner as the image source.

Note: For best OCR results with PDF files, we recommend operating TopOCR in "Scanner Mode", by setting the Settings->Image Source to From Scanner.

Language


Allows you to select the language for OCR recognition, the default is English.

Configure OCR Options... Dialog Settings





TopOCR OCR Document Type


Allows you to select between multi-column books or single column receipts. Receipt mode is also useful for retaining original layout. TopOCR is only utilized for TWAIN scanning and for reading multi-page PDF files.

Binarization


  1. Background Removal Slider - remove noise and objects like fingers from the document image background

  2. Contrast Equalize - equalizes background on more reflective images, such as a glossy magazine page

  3. Contrast Maximize - maximizes background contrast to produce better text binarization from darker backgrounds

  4. Small Print - helps to enhance small text

  5. Straighten Columns - removes document skew and page curl from books

Tesseract OCR Engine Selection


Allows you to select which OCR classifier Tesseract will use for character recognition. Either the LSTM OCR engine or the TAO OCR engine can be selected.