This topic gives detailed information about the PDF format supported by Readiris and ways in which you can make good use of the PDF files.
The format PDF Text creates a searchable PDF file that contains the text (and possibly graphic zones for photographs, artwork etc). The page image is not contained in the single-layered PDF file.
The format PDF Image-Text creates a searchable PDF file that contains the page image and the recognized text. The page image is contained in the two-layered PDF file above the text.
Note: compression is used for all elements. Black-and-white images are Group 4 compressed TIFF files, greyscale and color images are JPEG files (with (0.8) high quality). The text is compressed using the Gzip mode.
Tip: Readiris allows you to create bookmarks automatically.
Tip: Readiris allows you to embed the fonts in PDF documents!
“Text only” PDF files are much more compact than image files!
Text-based PDF files are searchable. (Bitmap images - “image only” PDF files - can be viewed but not searched.)
Text-based PDF files are editable. (Bitmap images can be viewed but not edited.)
The recognized text can obviously be edited and re-used. (Bitmap images can be viewed but not edited.)
Use the TouchUp Text tool of the Acrobat software to correct small recognition errors in the PDF file.
Tip: it takes the appropriate version of Adobe Reader to correctly display the resulting PDF files! To view and print Central-European texts (such as Czech and Polish), Baltic texts, Turkish and Cyrillic (“Russian”) texts in the PDF format, you must have the special “CE” version (Central-European) of the Adobe Reader software. (You can find this software on the Readiris CD-ROM.)
You can isolate the text from an “image-text” PDF file. You can also convert text-only PDF files into text files. Open the file with Adobe Acrobat and use the command Save As to save it in a text file (in the Word, RTF, HTML or Text format).
To re-use small text portions from a PDF file in other applications, select the Select Text tool of the Adobe Acrobat software, select the required text and copy-paste it to another application. (The command Select All selects all text of the current page or of the entire document, depending on your view mode.)
Use the Search command of your Adobe Acrobat or Adobe Reader software for simple searches within a document and for advanced searching across several PDF documents.
The button Search of the Adobe Acrobat or Adobe Reader software finds complete words or word parts in the current PDF document. Acrobat looks for the word by sequentially reading every word on every page in the file.
The button Search of the Adobe Acrobat or Adobe Reader software also allows you to perform advanced and fast searching on a collection of indexed PDF documents.
You can search for a simple word or phrase.
You can expand your search query by using wildcard characters and Boolean operators.
You can use the search options to refine your search further.
Index-based searching implies that the “full-text” index was created for a collection of PDF files with the command Catalog. (A “full-text” index is an alphabetized list of every word used in a document or a series of documents. Index-based searching is much faster than sequential reading: Adobe Acrobat and Adobe Reader go right to the word in the list rather than progressively reading through the documents.)