You can export the recognized documents in various text formats. Just change the formatting options and click the button Recognize again to reformat your OCR results!
Select an output mode: save the results to an external file (with a document format supported by your text application), place the results in the Windows clipboard or send the recognized document to a target application.
Readiris outputs to all major office suites, word processors and spreadsheets and to “poor” text formats that generate “plain text”. Readiris exports data directly to such applications as Microsoft Word and Excel, Adobe Reader, the major web browsers etc.
Tip: the option Open after Saving opens the recognized document once it’s saved. (The Windows file types determine which application will be started up.)
Tip: the option Send by E-mail creates a new mail message and inserts the recognized document as mail attachment!
Tip: the option Create One Document per Page sees to it that each page of a multipage document is saved in a separate file. If the user gives the file name text.doc, the files will be called text-1.doc, text-2.doc etc.
The option Create Body Text avoids text formatting by Readiris.
You get a continuous, running text. All formatting, if any, is done afterwards by the user.
(Body text is what you get when you right-click the mouse (Context menu) and select the command Copy as Text to recognize the window under the mouse cursor.)
The option Retain Word and Paragraph Formatting takes an intermediate position between body text and “autoformatting”.
The font type (serif - sans serif, proportional - fixed, normal - condensed), size and typestyle (bold - italic - underlined - superscript - subscript) are maintained across the recognition.
The tabs and the alignment (left - centered - right - justified) of each block are recreated.
The text blocks and columns aren’t recreated - the paragraphs just follow each other.
The tables are recaptured correctly.
The option Recreate Source Document recreates a facsimile copy of the original document.
The text blocks, tables and graphics are recreated in the same place and the word and paragraph formatting are maintained across the recognition.
The bulleted and numbered lists are recreated.
You get a true copy of your source document, be it a compact and editable text file, no longer a scanned image of your document.
The option Use Columns instead of Frames determines how the “autoformatting” gets done: the text blocks, tables and graphics can be stored in frames or “flowing” columns (if any).
Columnized texts are easier to edit than documents containing several frames: the text flows naturally from one column to the next!
Tip: when the system is unable to detect columns in the source document, this formatting mode uses frames anyway as a “fallback” position!
The option Insert Column Breaks determines whether you insert “hard” column breaks at the end of each column.
Any text you edit, add or remove remains inside its column; no text ever flows automatically across a column break. All text that follows a column break is moved to the top of the next column!
Enable this option when you want to maintain column breaks where these were detected in the recognized document - whatever text editing gets done after the OCR.
Tip: in newspapers and magazines, the various columns on a page often correspond to different article “threads”. Having text flow from one column to the next “on the sly”, covertly may not be a good idea!
Tip: disable this option when you have columnized body text: you’ll ensure the natural flow of the text from one column to the next.
Which formatting options are available depends on the selected output mode.
“Poor” text formats generating “plain” text (such as Text (ANSI)) do not support advanced formatting codes.
PDF documents by nature imply “autoformatting” etc.
Warning: WordPad is a “reduced” text editor, not a fully featured wordprocessor; WordPad may open Word (DOC) and RTF files but ignores most formatting codes such as columns, text frames, alignment etc.
Tip: Readiris detects any web page “URLs” and e-mail addresses in the scanned documents and recreates them as hyperlinks in the output!
Merge Lines into Paragraphs enables the automatic paragraph detection. This prevents the insertion of carriage returns (CR or EOL codes) at the end of each line, Readiris wordwraps the recognized text until a new paragraph starts, and “reglues” hyphenated words at the end of a line. (This option is not available for Adobe Acrobat files: PDF files always store text line by line.)
Include Graphics includes the graphics in “autoformatted” text files.
Create Bookmarks creates bookmarks for the text windows, graphics and tables in Adobe Acrobat PDF files. (Tip: the Adobe Reader and Adobe Acrobat software can create thumbnails for your PDF documents dynamically!)
Embed Fonts embeds the fonts in the Adobe Acrobat PDF files.
Embedding fonts prevents font substitution when readers view and print the recognized document. It ensures that readers - whatever their computer configuration may be - see the text in its original fonts.
Embedding the fonts increases the file size of the recognized documents (somewhat)!
Readiris outputs tabular data to spreadsheets, word processors and web browsers: tables gets reconstructed cell by cell in worksheets and inserted as table objects in word processor files.
Tables can be sent to the spreadsheet Microsoft Excel and the word processor Microsoft Word, can be sent to the clipboard, saved inside a Word (DOC), RTF (“Rich Text Format”) and HTML file, and saved using a table format.
Retain the word and paragraph formatting or recreate the source document.