Text Format

You can export the recognized documents in various text formats. Just change the formatting options and click the button Recognize again to reformat your OCR results!

Output modes

Tip: the option Open after Saving opens the recognized document once its saved. (The Windows file types determine which application will be started up.)

Tip: the option Send by E-mail creates a new mail message and inserts the recognized document as mail attachment!

Tip: the option Create One Document per Page sees to it that each page of a multipage document is saved in a separate file. If the user gives the file name text.doc, the files will be called text-1.doc, text-2.doc etc.

Text formatting

You get a continuous, running text. All formatting, if any, is done afterwards by the user.

(Body text is what you get when you right-click the mouse (Context menu) and select the command Copy as Text to recognize the window under the mouse cursor.)

The font type (serif - sans serif, proportional - fixed, normal - condensed), size and typestyle (bold - italic - underlined - superscript - subscript) are maintained across the recognition.

The tabs and the alignment (left - centered - right - justified) of each block are recreated.

The text blocks and columns arent recreated - the paragraphs just follow each other.

The tables are recaptured correctly.

The text blocks, tables and graphics are recreated in the same place and the word and paragraph formatting are maintained across the recognition.

The bulleted and numbered lists are recreated.

You get a true copy of your source document, be it a compact and editable text file, no longer a scanned image of your document.

Columnized texts are easier to edit than documents containing several frames: the text flows naturally from one column to the next!

Tip: when the system is unable to detect columns in the source document, this formatting mode uses frames anyway as a “fallback” position!

Any text you edit, add or remove remains inside its column; no text ever flows automatically across a column break. All text that follows a column break is moved to the top of the next column!

Enable this option when you want to maintain column breaks where these were detected in the recognized document - whatever text editing gets done after the OCR.

Tip: in newspapers and magazines, the various columns on a page often correspond to different article “threads”. Having text flow from one column to the next “on the sly”, covertly may not be a good idea!

Tip: disable this option when you have columnized body text: youll ensure the natural flow of the text from one column to the next.

Poortext formats generating plaintext (such as Text (ANSI)) do not support advanced formatting codes.

PDF documents by nature imply “autoformatting” etc.

Warning: WordPad is a “reduced” text editor, not a fully featured wordprocessor; WordPad may open Word (DOC) and RTF files but ignores most formatting codes such as columns, text frames, alignment etc.

Tip: Readiris detects any web page URLs and e-mail addresses in the scanned documents and recreates them as hyperlinks in the output!

Options

Embedding fonts prevents font substitution when readers view and print the recognized document. It ensures that readers - whatever their computer configuration may be - see the text in its original fonts.

Embedding the fonts increases the file size of the recognized documents (somewhat)!

Table recognition - spreadsheets

Readiris outputs tabular data to spreadsheets, word processors and web browsers: tables gets reconstructed cell by cell in worksheets and inserted as table objects in word processor files.