Dream 52

home *** CD-ROM | disk | FTP | other *** search

/ Dream 52 / Amiga_Dream_52.iso / Linux / Magazine / wwwoffle-2.1.tar.gz / wwwoffle-2.1 / UPGRADE < prev next >

Wrap

Text File | 1997-12-19 | 5KB | 122 lines

WWWOFFLE - World Wide Web Offline Explorer - Version 2.0 ======================================================== WHAT? ----- The format of the cache that wwwoffle uses to store the web pages has changed in version 2.x compared to the previous versions. If you have used wwwoffle version 1.x then you *MUST* upgrade the existing cache before you can use the new version of the program. HOW? ---- *** READ ALL THIS SECTION BEFORE DOING ANYTHING ELSE *** When you compile wwwoffle there is another program called 'upgrade-cache' that is also compiled. You need to run this program to convert the cache from the old format to the new one. There are a number of options that you can take for this upgrade route, the following applies to all of them. In each of the options the basics are that you must run upgrade-cache and it takes an argument of the name of the cache directory that is used (usually /var/spool/wwwoffle). When the program runs it prints out informational and warning messages, these may be useful. Option 1 - Be reckless Run 'upgrade-cache /var/spool/wwwoffle', watch the messages go flashing by and hope that it works. Option 2 - Be brave With sh/bash run 'upgrade-cache /var/spool/wwwoffle > upgrade.log 2>&1' or with csh/tcsh run 'upgrade-cache /var/spool/wwwoffle >& upgrade.log' read the messages and check the warnings. Option 3 - Be safe Backup the cache first then follow option 2. With GNU tar I suggest that you use the --atime-preserve option so that the access times of the files in the cache are not modified by performing the backup. The index and purge options in wwwoffle use these so it is important. When it finishes, the multiple host named directories in /var/spool/wwwoffle are gone, moved into a new sub-directory called http. The outgoing directory and this http directory are the only directories that should be left. If there is a warning message then you should decide what needs doing. It could be any of the following reasons: That upgrade-cache was run by a user without write permissions. That one or more files were changed while the program was running. That there is a spare file in one of the host directories that needs deleting. That there is a symbolic link that does not point anywhere. If the upgrade-cache program crashes then that is a bug - tell me. If you are left with many files or directories and the warnings are unclear then this may be a bug - tell me. If there are only a small number of spare files or directories, then just delete them, you probably won't notice that they are missing. WHY? ---- The existing scheme for naming of the files in the cache had some problemsm, the new one is better. 0) It was designed for my personal use which did not involve many web-pages stored and did not visit any pages with unusual names, You could say that the hacks that I implemented to get it working as I wrote it were not well enough thought out. But at the time I wrote it I wanted to get it working as soon as possible and did not write it with the future growth in mind. The scheme as implemented has not caused any problems for me personally. 1) It was possible for a web-page that has several possible arguments to be stored incorrectly. This is because for each page that has arguments a hash value is computed from the arguments to provide a unique filename. The reason for this failing is that I used a hash function that I made up on the spot, giving a 32-bit hashed value. This seemed to be sufficient for 4 billion sub-pages with the same path name for each host and path combination. As it turned out the hash function was not strong enough and the number of possibilities was much smaller. 2) There was no provision for any protocol other than http. Very quickly the idea of doing ftp as well came to my mind, but could not be implemented easily or cleanly with the current system. 3) The outgoing directory was inefficient for large numbers of files. An increasing sequence of numbers was used resulting in slow access, this was fixed in version 1.2x but there could still be many requests for the same URL in the directory. Now a unique name based on a hash is used so that only one request for each page is stored. 4) Bad characters and url-encoded URLs caused problems. Some URLs that had funny characters including URL-encoded sequences caused problems. The URL http://www.foo.com/~bar and http://www.foo.com/%7Ebar are the same URL but could be stored in different files. 5) It is now a neater design with no special cases. Previously only files with arguments needed hashing, now all of them use a hash, this simplifies the logic. The format of the outgoing directory is the same as the other directories. 6) There are more possibilities for future expansion. It is now possible to consider adding more files to the cache to store extra information about a URL, for example a password. It is obvious now that this would be another file with the same hash value but a different prefix.