A Guide for Using the TopOCR Command Line Interface
For those who have an interest in developing software which utilizes OCR, TopOCR's new command line interface offers the possibility of saving thousands of dollars in development costs by not requiring the purchase of an expensive and difficult to learn OCR SDK.
In addition to being a low cost OCR development platform, the command line interface can also allow blind users to plug their digital camera into their PC's USB port and let TopOCR process the images and generate speech output, all of which
can be done through a simple text interface.
General Command Line Usage:
From the command prompt, change to the TopOCR default installation directory (Usually, "C:\program files\topocr"), then type:
"TopOCR inputimage.ext [options\parameters] textoutput.ext"
Image Input
TopOCR supports .gif, .jpeg, .tiff, and .bmp input formats, in 24 bit, 8 bit and 1 bit pixel depths, if you don't specify a file extension, TopOCR tries to automatically figure out which format it is.
Options and Parameters
To gain a better understanding of how the command line options and parameters function, examine the "Configure Camera Filter" menu item
under the "Settings" menu in the TopOCR Image Window. Command line options can be in any order.
-THRESH Param1 Param2 Param3
-RESENHANCE
-SHARPEN Param1 Param2
-DESPECKLE Param1
-LANGUAGE Param1
The default LANGUAGE value is English, otherwise this needs to be set to one of the following:
ENG
FRN
GRM
ITL
SPN
POR
SWD
DAN
NOR
DCH
FIN
Text Output
The output file is simply "raw" text in ISO-8859 format and supports 11 languages. The text is "appended" to the file, so if the file name exists,
new text is appended to the end of the file.
Examples
topocr image1.tif -RESENHANCE -THRESH 6 40 60 test1.txt
topocr italian.jpg -LANGUAGE ITL test3.txt
topocr doc144c.tif test8.txt
Errors
In the event of an error, TopOCR will exit and leave an error message in a file named "error.log".
Calling TopOCR from a Batch File
for %%x in (*.jpg) do topocr.exe %%x output.txt
The example above if put into a batch file and called with the name of a directory will process each .jpg file in the named directory with TopOCR
and automatically append the text output of each image file into a text file called output.txt.
Calling TopOCR from a C/C++ Program
To run TopOCR in a new process:
CreateProcess(NULL,"c:\\program files\\topocr\\topocr input.jpg output.txt",
NULL,NULL,false,HIGH_PRIORITY_CLASS|CREATE_NO_WINDOW,NULL,NULL,
&startInfo,&processInfo);
If TopOCR completes successfully, it will give a return code of "0", if there was an error the return code will be "-1".
Or to call TopOCR from within the same process:
system("c:\\program files\\topocr\\topocr input.jpg output.txt" );
In general you can construct an interface that's almost as fast as having the TopSoft DLL in your application, the only overhead is loading
TopOCR for each image as opposed to having a memory resident DLL already loaded, but this overhead is rather small. Using this method has some benefits, for example all the image
handling (file open, read, decode, etc.) is done for you which greatly simplifies your code. You only need to select an image file to process and a
text file to receive the OCR output. It's likely that in less than 20 lines of code, you could take the image file string and the output file string and call wsprintf() to
combine the input and output file names with the other command line information to call TopOCR. Then when the call has completed, you can open
the text file that TopOCR created and load it into a buffer in your program. When adding OCR to your software is this easy and inexpensive, why would you
spend thousands of dollars on a difficult to use OCR SDK and pay per-copy royalties that are much higher than TopOCR?