Recognition with Training
As previously stated, FineReader can read texts set in practically any font
regardless of print quality. Consequently, no prior training is normally
required before recognition can take place. FineReader, nevertheless, features a
number of user pattern training tools.
Train User Pattern mode may come in useful when:
- recognizing texts set in decorative fonts;
- recognizing texts containing unusual characters (e.g. mathematical
symbols);
- recognizing large volumes (more than a hundred pages) of texts of low
print quality.
Tip: Use Train User Pattern mode only if one of the above
applies. In other cases you may obtain a slight increase in recognition
quality, but the time and effort involved will probably outweigh the benefit
received.
Pattern training works as follows. One or two pages are recognized in
training mode, and, subsequently, a pattern
created. FineReader then uses this pattern to aid recognition of the remaining
text.
Sometimes two or even three characters may get "glued" together,
and FineReader may be unable to enclose each character in an individual frame to
separate them. If this proves to be the case (i.e. you cannot move the frame so
that it contains only one whole character and no other character parts), you can
train FineReader to recognize the whole inseparable character combinations.
Examples of character combinations frequently found glued together include ff,
fi, and fl. Such combinations are referred to as ligatures.
Notes:
- A pattern is only useful in the case of documents that have the same font,
font size, and resolution as the document used to create the user pattern.
- Each pattern is created for a particular batch. Consequently, if a batch
is deleted, its user pattern is also deleted. Patterns can, however, be
copied into other batches. To transfer a user pattern to another batch,
simply save the batch options in a batch
template format file.
- If you switch to recognizing texts set in a different font, always disable
any user patterns - choose the Do not use user pattern item on the Recognition
tab, menu Tools>Options.
To train a user pattern:
- Start Train user pattern mode - click the Train user
pattern radio button on the Recognition tab, Tools>Options
menu,
in the Training group. The default pattern name
("Default") will be displayed in the status line.
- Click the 2-Read button.
- Train your pattern - recognize one or
more pages in Train user pattern mode.
Trained characters are saved in the default pattern. Once you have completed
training the pattern, FineReader will save the pattern (Default.ptn)
in the current batch folder.
- Edit your pattern.
- Deactivate training mode (click the Use user pattern button on the Recognition tab).
- Recognize the rest of the text - click the 2-Read
button.
Note:
- To create several patterns for the same batch, use the Pattern
Editor dialog (click the Pattern Editor button on the Recognition
tab or select the Tools>Pattern Editor menu item). Create a
new pattern (click the New button in the dialog) and select it (click
the Set Active button). Working with a created pattern is no
different to working with a default pattern (see steps
1-5). Keep in mind, however, that only one pattern may be active at any
one time.
- If you've created several patterns for the same batch, the active one will
be the pattern that was last created. The active pattern name is displayed
in the status bar. To activate another pattern, select the pattern of your
choice in the pattern list in the Pattern Editor
dialog (Tools>Pattern
Editor menu) and click the Set Active button. Then click the Use
user pattern button on the Recognition
tab, Tools>Options menu, in the Training group.
- If the Use built-in patterns option is set, FineReader will read
all texts using its built-in patterns and stop only at uncertain characters.
If you are training the system to read decorative and/or non-standard fonts
(for example, Tibetan) the use of in-built patterns may result in characters
being read incorrectly. If the latter occurs, disable the use of
in-built-patterns (clear the Use built-in patterns checkbox on the Recognition
tab) and train the system to recognize each unknown character it is likely
to encounter.