home *** CD-ROM | disk | FTP | other *** search
- GENIUS GS2000 - GS4000
- PRODIGY OCR V1.69
- (c) 1988, 1989 Synergy (UK) Ltd. All rights reserved. 25th May 1989
-
- NEW FEATURES --- MAKE SURE YOU READ THIS SECTION CAREFULLY ---
- ============
-
- V1.67+ :-
- 1) Context checking to reduce the chances of 0Oo and 1lI characters
- occurring out of context.
-
- 2) Context checking not enabled in learn mode, so upper/lower case
- characters can occur unexepectedly in learn mode. THESE ONLY
- NEED TO BE TAUGHT IF THE REAL CHARACTER IS NOT SHOWN IN THE
- POSSIBLE CHARACTER STRING. ---
-
- 3) The Font dictionary is saved to disk when overflow space has been
- used up in learn mode. It is then rebuilt into memory. This shows
- up as a pause when you are teaching.
- V1.69+ :-
- 4) Pixel display in learn mode.
-
- 5) Font status on display in learn mode.
-
- 6) "WORKING" message flashes at bottom right of screen during the
- recognition process.
-
- 7) While "WORKING" is displayed, hitting the space bar will cause the
- recognition on the current line to stop and the OCR will display
- what is has recognised thus far and then move on to the next
- line of text.
-
- 8) Only the Escape key will now abandon a run completely.
-
- 9) A list of all existing .TIF files will be shown before you enter
- your TIF file name.
-
- 10) The text file name will default to the same name as your TIF file
- and you are also shown all your .TXT files before you enter your
- TXT file name.
-
- 11) You can choose file names either by using the cursor keys as
- described below, OR by typing the number shown in front of the
- filename and then pressing the <Enter> key.
-
-
- List of files
- =============
-
- OCR.EXE the OCR program for GS4000/GS2000
- REMOVE.EXE a utility to remove bad characters from a font-file
- README.DOC this file
-
- COURIER.OCR a fontfile ( all fontfiles must have the extension .OCR )
- HELVETIC.OCR a fontfile for Helvetica fonts.
- TIMROMAN.OCR a fontfile for Times Roman fonts.
-
- Installing the software
- =======================
-
- The software can be run from floppy, hard or RAM discs. To install on a hard
- disc C:
- A:>C:
- C:>MD \OCR
- C:>CD \OCR
- C:>COPY A:\*.* /V
- ( You can install OCR in a different directory, or on a floppy disc, as
- necessary. The configuration file should be in the current working directory
- when you run the software. )
-
-
-
- Running the software
- ====================
-
- The OCR program can be run in two modes, either by supplying all options on
- the command line, or via a menu screen, (however you can not use the Merge
- facility from the command line).
-
- (a) Menu screen
- ---------------
- At the DOS prompt, type 'OCR' to use a GS4000 or 'OCR -2' to use a GS2000
- and press enter, e.g.
-
- C:>OCR -2
-
- The program will then display a menu screen, thus:
- -----------------------------------------------------------------------------
- Resolution (DPI) GENIUS GS2000
- 200 PRODIGY OCR V1.69
-
- F1: Merge Scan (ON / OFF) __OFF F6: Learn mode ( ON/OFF) OFF
-
- F2: Character spacing F7: Scanner /TIF file SCANNER
-
- F3: Automatic detection F8: Select font (.OCR) COURIER
-
- F4: Sensitivity (-9 to +9) 0 F9: Change Font file path
-
- F5: Touching Characters ___ON F10: Change image file path
-
- <Enter> Start OCR <esc> Exit to DOS
- -----------------------------------------------------------------------------
-
- ( File selection area )
-
-
-
- -------------(c) 1988,1989 Synergy (UK) Ltd. All rights reserved. -----------
-
- The main box shows which keys to use to control the software, and the lower
- box is used to display lists of filenames for selection.
-
- Pressing F1 allows you to scan text up to 210mm (8 inches wide).
- For scanning text wider than 105mm you can do 2 scans a left and a right scan,
- and then merge the text together. This merge facility is enable by pressing F2
- to the ON option. To disable this feature, press F2 again to get the OFF
- option.
- NOTE: If the merge option is on the next two scans will ALWAYS be merged.
- SEE BELOW FOR DETAILED DISCUSSION OF THE MERGE FACILITY.
-
- Pressing F2 you can force the OCR software to treat the text to be scanned
- as Mono Spaced(Fixed pitch), Proportionally spaced, or let the software work
- it out for itself (Automatic detection).
- If you select fixed pitch then you must enter the pitch size in characters
- per inch using F3.
- If you select proportional then you must use F3 to enter a word gap in pixels.
- (Typical values can be 4,5 at 200dpi, 6,7 at 300dpi, 8-10 at 400dpi).
- This will help eliminate any extra spaces between letters that form one word
- when the system uses its automatic mode. F3 has no effect if you select
- Automatic detection.
-
- F4 allows you to adjust the sensitivity of the recognition algorithm. Normally
- the software uses a sensitivity value of 0 in normal mode and +6 in learn
- mode. If you wish to make the software more critical then increase the above
- values, if you want to make the software less critical then decrease the above
- values.
-
- F5 allows you to manually control the touching character algorithm. This
- will help recognise characters that are touching. However, this algorithm
- works best with a GOOD dictionary. If you are in learn mode you will find
- it ususally better to turn this OFF since you won't have a good enough font
- for it to work well. The default is ON in free run and OFF in learn mode,
- however you can override these defaults by using F5 AFTER you have used F6
- to select your desired mode.
-
- Pressing F6 toggles Learn-Mode on/off. When it is off ( the normal state ),
- characters that are not recognised are replaced by another character ( by
- default '@' ) and the process proceeds automatically. If learn-mode is on,
- characters which are unrecognised, or recognisable but poorly formed, are
- displayed on the screen enlarged. The user may enter what the character is,
- or skip over it.
-
- Pressing F7 toggles between the scanner and image files. You can either
- perform recognition as you scan the page in, or select TIF and recognise
- text from a previously scanned .TIF file.
-
- The F8 key allows you to select a different font from the one displayed.
- Press F8 and a list a font files is displayed. Choose one with the cursor
- keys and press Enter to select it, or Esc. to go back to the original menu.
- If the font you want is not in the window, press PgUp/PgDn until it appears.
- You can also select the file by typing the number that preceeds the filename.
-
- Key F9 is used to change the default path for the .OCR files. Press F9 and
- the program shows the current path, and prompts you to enter a new path.
- To keep the old path just press <Enter> ( or <Esc> ), otherwise type in a
- new path.
-
- Key F10 is used to change the default path for the .TIF files, and for saving
- TIF files produced when scanning with the scanner. Press F10 and
- the program shows the current path, and prompts you to enter a new path.
- To keep the old path just press <Enter> ( or <Esc> ), otherwise type in a
- new path.
-
-
- Once the parameters are selected, pressing the Enter key will start the
- recognition process. Alternatively, pressing Esc. will write your
- configuration to OCR.CFG and exit to DOS.
-
-
- Once Enter is pressed the program will either start up the scanner ( if
- SCANNER selected ), or ask you to choose an image file ( if TIF selected ).
- If the SCANNER is selected then you will be prompted for the name of a file
- in which to save the image that the scanner produces. Note that this image
- file will be put in the directory that you specified with F10 (above).
-
- The image filename is selected in the same way as a font file, with the
- cursor and PgUp/PgDn keys. Press enter to select the file, or Esc. to go
- back to the menu.
-
- Whether SCANNER or TIF is selected, you must now type in a name for the
- output text file, e.g.
- textfile.txt
- or
- b:\test.doc
- Pressing Esc instead will return you to the menu.
-
- NOTE -- If you want your output to go into the same file as the previous
- ==== scan, then just press return to use the filename as shown on the
- line below the the prompt. If you want the output to go to a new
- file then you MUST change the filename.
-
- Once the recognition process has finished, the program returns you to the
- menu screen, (UNLESS Merge Scan is ON see below).
-
-
-
- (b) DOS command line
- --------------------
- You can bypass the menu screen by supplying all the parameters on the DOS
- command line. The parameters are listed below, and are in the same format
- as the OCR.CFG file. Parameters may be preceded by '-' or '/'. The only
- mandatory parameter is a filename for the output.
- e.g.
- OCR test.txt -s -s selects the scanner.
- OCR abc.doc /Imanual.img /I,/i,-I,-i all select imagefile input
- OCR /h /h or -h lists the options
-
-
- (c) Learn mode
- --------------
- When you run the program in learn mode, the screen is split into two parts.
- The top part shows the scanned text input as pixels, and the bottom part is
- used for teaching or examining characters. Characters are shown with a cursor
- around the pixels that the software thinks that the character is comprised.
-
- In the lower part of the screen, the current text line will be shown
- next to the '->' symbol, and one of the characters is
- being offered for verification as shown by the ^. To teach the character just
- type it and press Enter. You will then see the ^ advance to the next character
- and the cursor will move to the next characters pixels in the top part of the
- screen.
- Just pressing Enter will skip over the character to the next one on the
- line (if any), <esc> will quit, and
- F10 will save the current state of the font data file, if you type Enter
- to the prompt SAVE DATA.
-
- TEACHING A NEW FONT
- ===================
-
- NOTE -- If you press F1 or the Del key you will delete the character currently
- ==== above the ^ from the currently loaded font file. A message will appear
- to indicate that it has been deleted.
- Note that the deletion will not affect your file on disk unless you
- finish the entire page in learn mode, OR you use F10 to dump the font
- file.
-
- You may move along the line of characters shown, by using the cursor
- keys on the numeric keypad. "Left arrow" & "Right arrow" move you
- one character to left or right respectively. If you press "Down arrow"
- the cursor will move to the next rightmost character that has a
- probability of less than 88%. If there is no such character, the cursor
- will not move. Pressing "Up arrow" is the same except that the next
- leftmost character will be found.
- The Home key will move you to the first character on the line, and
- the End key will move to the last character on the line.
-
- When you enter a new character and press "return", the cursor will
- move to the next character below 88% if there is one, otherwise it
- will just move to the next character.
-
- If the ^ is under the first character on the line and it has a
- probability > 88%, then you do not need to edit this line unless you
- can see an obvious mistake.
-
- When you are satisfied with the current line, press F5, and the OCR
- software will proceed to the next line.
-
- NOTE You will also see a display in the lower part of the screen which
- ==== indicates which characters are in the dictionary and which are not.
- The fact that a character appears in the list of "Taught Characters"
- does not mean that you don't NEED to teach it. Your only real guide
- to this is the probability figure as discussed above.
-
- -----------------------------------------------------------------------------
-
- Configuration /Command line options
- ===================================
-
- These are the options stored in the configuration file OCR.CFG.
- They may also be entered on the command line if you do not want to use the
- menu screen, e.g.
- Key F10 is used to change the default path for the .TIF files. Press F10 and
- the program shows the current path, and prompts you to enter a new path.
- To keep the old path just press <Enter> ( or <Esc> ), otherwise type in a
- new path.
- OCR textfile.txt -ipage1.img -l100
- will run the software, produce an output text file 'textfile.txt' from the
- image file 'page1.img', with a scan width of 100 mm.
-
- OCR -h or OCR /h
- will produce a list of the options.
-
- -ma mode -ma learn OFF (auto), -ml learn ON
-
- -s scanner is source for image
-
- -i<imagefile> .TIF file is source
- -p. default path for .TIF files ( . = current directory )
-
- -fCOURIER font-file (.OCR)
- -o. default path for .OCR files ( . = current directory )
-
- -d. default path for text output files ( . = current directory )
-
- -h help - list options
-
- -$ units displayed in inches ( default is mm. ).
-
- Values shown above are system defaults.
- The option '-k' will write the command-line options to OCR.CFG if entered on
- the command-line, e.g.
- ocr -$ -k
- will set units to inches, write the configuration file, and exit. This method
- can be used to change any of the options not available on the main menu.
- Alternatively, OCR.CFG may be edited with a simple ASCII editor, or a word-
- processor in 'text' or 'ascii' mode.
-
-
- The Configuration File
- ======================
- The configuration file is a plain ASCII text file, which can be edited with
- EDLIN or most word-processors. A typical file might look like this:
-
- OCR.CFG
- -------
- -pC:\IMAGES
- -d.
- -o.
- -fCOURIER
- -ml
- -iTEST.TIF
- -$
-
-
- *** NB ***
-
- The only way to reset to millimetres from inches is to delete the
- line "-$" from the .cfg file using an editor.
-
-
- Creating a new font file
- ========================
-
- A new font data file can be created either by modifying an existing file, or
- by making one from scratch. The new file must have the extension OCR.
-
- To add new characters to an existing font, copy its .OCR file to a new file
- with the same extension, select this as your font file when you run OCR, and
- use learn mode to teach it the new characters, e.g.
- C:>cd \ocr
- C:>copy courier.ocr courier2.ocr /v
- C:>ocr
- and select font courier2.ocr and learn-mode ON.
-
- To create a completely new font, you must start with a 'blank' fontfile, and
- teach it characters with learn mode. Run OCR, and press F8 to select a font.
- Choose the 'Create font' option. The program now asks you for a filename,
- makes an empty fontfile for you, and selects it as the current font.
-
- When teaching a new font, it is a good idea to scan in a sheet of all the
- characters, something like this:
-
- ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 -=# [] {} -=#,.;:
- abcdefghijklmnopqrstuvwxyz !"£$%^&*() \| /? '@ <>
- ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 -=# [] {} -=#,.;:
- abcdefghijklmnopqrstuvwxyz !"£$%^&*() \| /? '@ <>
- ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 -=# [] {} -=#,.;:
- abcdefghijklmnopqrstuvwxyz !"£$%^&*() \| /? '@ <>
-
- ... and so on.
-
- When starting with a brand new dictionary, you may find that a number of
- the lines at the top of the page being scanned are ignored or produce
- "noise" characters. Just persevere, the system will soon work out all it
- needs to scan properly.
-
- Scanning text wider than 105mm (4 in).
- ======================================
-
- If you select the Merge Scan option using the F1 key, the OCR will work as
- described above, except that:-
-
- 1) The text file name that you use will be the name of the MERGED text.
-
- 2) You must first scan the left hand section of the text. This will then
- be OCR'd as usual and when you accept it, it gets saved in a file
- LEFT.TXT. The scanner light is then enabled and you can scan the right
- hand section of the text. YOU MUST ENSURE THAT THERE IS AT LEAST 12mm
- (0.5in) OVERLAP BETWEEN THE TWO SCANS.
-
- Don't worry the the left hand text lines end with funny characters and
- that the right hand text line start with funny characters. This is
- because you can not always get a whole number of characters in the
- scan window.
-
- 3) After you accept the second scan, the Merge facility will run with
- the following screen display.
-
- -----------left.txt-----------------|-----------------Right.txt---------------
- |
- |
- |
- |
- |
- |
- left | right
- |
- hand | hand
- |
- text | text
- |
- |
- |
- |
- |
- |
- |
- ------------------------------------|-----------------------------------------
- <F10> Save, <Esc> Exit, <F5> Merge, <F6> Toggle display, <cursor keys> Align
-
- To use the merge facility you must identify any two halves of a full
- line in each window that can be merged.
- To do this you can use the left & right arrow keys to swap between the
- two windows above, and the up & down keys and the PgUp & PgDn keys to
- scroll the windows. You will notice that the line in the centre of
- each window is highlighted. This is the line that must be
- matched for the merge. When you have two suitable halves highlighted
- simply press the <F5> key to do the merge.
- The screen will then swap to display the merged text. If the merge is
- not good enough, you can press <F6> and try two different lines for
- the merge.
- To exit from merge use <Esc>. You can save the merged text at any time
- by pressing <F10>.
-
-
- The Remove Utility
- ==================
-
- The OCR package comes with a utility, REMOVE.EXE, which allows you to remove
- from a font-file bad or wrongly entered characters. For example, you may have
- mistakenly selected 'C' in learn mode instead of 'V'. The only way to correct
- this is to remove all 'C's in the font, and re-teach them. REMOVE cannot
- selectively delete only certain instances of a character, all instances will
- be removed.
- The simplest way to use the utility is to go to your OCR directory and type
- REMOVE
- The program will then prompt you for the name of the font-file ( minus the
- '.OCR' extension ), and the character(s) you wish to delete. A backup copy of
- your font-file will be made, with the same name and the extension '.$CR', e.g.
-
- C:\OCR\REMOVE
-
- REMOVE - Utility to remove characters from a font file.
- (c) 1988 Synergy (UK) Ltd. All rights reserved.
-
- font-file : COURIER
- characters : RP
- Reading file :COURIER.OCR
- Backup file :COURIER.$CR
- Writing file :COURIER.OCR
- Deleting characters :RP
-
- No. of deletions : 214
-
- In the above example, all occurrences of the characters 'R' and 'P' are removed
- from COURIER.OCR, and the old version of COURIER.OCR is copied to a file
- called COURIER.$CR.
-
- The full format of the command line is
- REMOVE [ < font-file> [ < character-list > ] ]
- so you could also type
- REMOVE COURIER
- and only be prompted for the character(s) to be removed, or
- REMOVE COURIER a?"W
- which will remove the four characters 'a', '?', '"', and 'W' from COURIER.OCR
- without further prompting.
-
- If the program reports the number of deletions to be zero, then no occurrences
- of any of the characters you listed were found.
-
-
- Notes
- =====
-
- 1.) If a particular page gives poor results, try adjusting the contrast - if
- the print is light increase the contrast, otherwise decrease it if the
- print is dark. Obviously this only applies when working directly from
- the scanner.
-
- 2.) When teaching a new font the FULL HEIGHT indicator should be ignored on
- the first few lines of text. It will eventually correct itself and you do
- not lose the information you appear to put in incorrectly.
-
- 3.) DOS 2.XX users
- --------------
- The default path for both .OCR files and text output files is '.', which
- is the DOS abbreviation for the current working directory. If your ver-
- sion of DOS does not recognize this abbreviation, you must specify an
- explicit path either by editing the OCR.CFG file or setting these opt-
- ions from the command line, e.g.
-
- Floppy-disc example: Hard-disc example:
-
- edit OCR.CFG to read
- -oA:\ -oC:\OCR ( to change the .OCR path )
- -dA:\ -dC:\OCR ( to change the text file path )
-
- or go to the directory containing OCR.CFG, and type
-
- OCR -oA:\ -dA:\ -k OCR -oC:\OCR -dC:\OCR -k
-
- ( The '-k ' option causes the program to save the preceding options in
- OCR.CFG ).
-
- 4.) REMOVE.EXE - When entering the double-quote character on the command
- line, you must precede it with '\', e.g.
- REMOVE COURIER s\"q
- will remove the three characters 's', '"', and 'q'. This is necessary
- because the parameters for a DOS command line may optionally be enclosed
- in quotes, thus
- REMOVE COURIER "sq"
- will remove only 's' and 'q'.
- This does NOT apply when entering characters in response to the programs
- own prompt, e.g.
- C:>REMOVE
-
- REMOVE - Utility to remove characters from a font file.
- (c) 1988 Synergy (UK) Ltd. All rights reserved.
-
- font-file : courier
- characters : s\"q
- Reading file :courier.OCR
- Backup file :courier.$CR
- Writing file :courier.OCR
- Deleting characters :s\"q
-
- No. of deletions : 81
-
- will remove all occurrences of 's', '\', '"', and 'q'.
-
- 5.) Colour Graphics Adapter ( CGA )
- -------------------------------
- If your are using the IBM CGA or a compatible graphics card, you will
- need to load the Graphic Character Set. ( This is not necessary when
- using EGA or VGA ). If the graphic characters are not loaded some of
- the line drawing used on the screen will not be displayed correctly.
- This affects only the appearance of the screen, the OCR software will
- perform normally in all other ways.
- To load the graphic characters, go to the directory containing your
- DOS files and type
- GRAFTABL
-
-