home *** CD-ROM | disk | FTP | other *** search
- <head>
- <title="...forever...">
- <font=monaco10.fnt>
- <font=newy36.fnt>
- <font=time24.fnt>
- <image=back.raw w=256 h=256 t=-1>
- <buf=2621>
- <bgcolor=-1>
- <background=0>
- <link_color=253>
- <module=console.mod>
- <pal=back.pal>
- colors:
- 251 - black
- </head>
- <body>
- <frame x=0 y=0 w=640 h=2621 b=-1 c=-1>
-
-
- - -- - --------------------------------------
- <f1><c000> <link=pic42_2.scr>PDF - what's this?</l> <f0>
- <f1><c000> part #2 <f0>
- ----------------------------- -- - --- --- --
-
- In the first part I have written few words about PDF documents. What you can
- and what you can do with this. It was about 2 weeks ago and I still can't find
- things which you can't do in PDFs :) It's really good for the user. But it's
- also a huge source of possible problems.
-
- Let's start with some fundamental information about the PDF structure.
-
- A PDF document consists of many, many, many dictionaries :) So, there is a
- dictionary with pages, contents, fonts, pictures, thumbnails, forms,
- interactive forms etc. Each dictionary can be stored "online" or relative.
- Each dictionary can consist of many other dictionaries (the best example is a
- font dictionary) - nested structure. Relative information is always
- accompanied by a list of object numbers with offsets to the place where the
- object is stored (for example the object number 100 is stored 1300 bytes from
- the beginning of the file). It's a so called cross-reference table. So
- reading a PDF document is effectively like jumping from one dictionary to
- another.
-
- In practice it looks like this:
-
- 1. some general operations (document information, object list etc.)
- 2. reading of page tree
- 3. each page has a resource dictionary with fonts, pictures, annotations etc. -
- it is necessarily to read it
- 4. reading of resources, example of font (some steps are optional, but if they
- are you have to make it)
- - main dictionary
- - font file dictionary
- - encoding dictionary etc. etc.
- 6. unpacking and rendering of page contents
- - each command is stored _after_ variables. It means that we first read the
- variables and then the command. It makes the whole thing much harder than
- you can imagine.
- 7. displaying
-
- Generally the structure of each PDF looks like this:
-
- +-----------------+
- | File header |
- +-----------------+
- | |
- | Body |
- | |
- +-----------------+
- | Cross-reference |
- | table |
- +-----------------+
- | Trailer |
- +-----------------+
-
- Trailers give the location of the cross-reference table and some special
- objects within the body of the file.
-
- And the document structure is organized in this way:
- +-----------+
- +----| Page tree |
- | +-----------+
- |
- | +----------+
- |----| Outlines |
- | +----------+
- |
- +----------+ | +-----------------+
- | Document | |----| Article threads |
- | catalogue|----| +-----------------+
- +----------+ |
- | +--------------------+
- |----| Named destinations |
- | +--------------------+
- |
- | +------------------+
- |----| Interactive form |
- +------------------+
-
- Outlines and article threads are also much more complicated - they consist of
- many other subdictionaries. The page tree block consists of many blocks
- representing each page in the document. So, the page tree is organized just
- like pages in a normal book. For example:
-
- page1
- |
- +--->Chapter 1
- | |
- | |->page2
- | |->page3
- | |->page4
- | |->page5
- |
- |--->Chapter 2
- | |
- | |->page6
- | |->page7
- | |->page8
- | |->page9
-
- and so on. It means, that each page can also have sub-pages.
-
- Each page consists of: resources (fonts, pictures), content stream, thumbnail
- images (if present) and annotations (optional).
-
- I don't want to talk about details, we don't have so much time and space - the
- reference book has 1172 pages!!
-
- Does it look so hard? No, of course not ;)
-
- So, why's it not so perfect as it appears? There are just three reasons.
-
-
- 1. It is too complicated - it means that the reading time could be much
- shorter. Especially the page contents where the variables are stored before
- the command (like "0 0 100 100 re" - for a rectangle). Better form would be:
- "re 0 0 100 100".
- The other problem is that each command "line" can have at the end either œ\rœ
- or œ\r\nœ or œ\nœ or nothing. So you never know where the next "line" starts.
- There are however few tricks which can speed it up a little :)
-
- 2. Fonts. This is the hardest part in each PDF document. Just imagine that even
- Acrobat Reader has problems with it!! (example in pictures) On the first
- picture you see the screenshot from Adobe Acrobat Reader and the second one
- from GhostScript 8.x So, when you'll look at the details of some characters
- you'll see the difference. One can ask which one is the correct one? Try to
- guess ;) A small tip: the correct one comes from the application whose version
- is 8.x ;)
-
- Another example. This time 4 pictures (Adobe Acrobat, GhostScript, Porthos and
- MyPDF). So, you see how many problems Acrobat has - even one single character
- can't be displayed!!! GhostScript is better, but it's still not perfect.
- Porthos quite good :) MyPDF - hmmm, some problems with text positioning (I'm
- actually working on it). But at least all characters are displayed. Especially
- with the last example you see how many problems different readers have with
- different documents.
-
- 3. Each "distiller" (this name is reserved for Acrobat Distiller) produces PDFs
- in different way. For example the ending of lines or the way each font is
- stored etc. Then look at 1. and 2. ;) For example there is a distiller, that is
- converting text to vector graphic! Both 1. and 2. can really slow down the
- reading process on each platform. Particularly on our beloved Atari. Also some
- new or rare commands can produce many problems.
-
- So, now you know the structure of PDF document and know the problems we can
- have while reading it. Even if you think that these problems are easy to solve
- - you just think so ;)
-
- In the next part I'll try to tell you how MyPDF works and how I solved some
- speed problems.
-
- cu next time
- Rafael Kawecki
-
-
- -- - --- -- -------------------------------------------------------------------
- CHOSNECK 4th appearance contact us:
- done by the dream survivors greymsb@poczta.fm
- ----------------------------------------------------------------- -- - --- ----
- </frame>
- </body>
-
-