It was written for use by archive maintainers but can be used by anyone wanting to transfer a lot of files via ftp.
Regardless of how it is called, mirror always performs the same basic steps. It connects to the remote site, internally builds a directory listing of the local target directory, builds one for the remote directory, compares them, creates any subdirectories required, transfers the appropriate files (setting their time-stamps to match those on the remote site), creates any symbolic links, removes any unnecessary objects and finally drops the connection.
Mirror can handle symbolic links but not ordinary links. It does not duplicate owner or group information. If you require any of these options, use rdist(1) instead.
Mirror is called in one of two ways shown in the synopsis above.
The first method is used to retrieve a remote directory into the current directory. If you are mirroring a directory, it would be best to end the pathname in a slash ('/') so the remote recursive listing is smaller or use the -r flag to suppress recursion (see -g below). The mirror.defaults file is not used.
In the second method given in the synopsis above, a minimal number of arguments are required and mirror is controlled by the settings read from the configuration files (or standard input). If a file named mirror.defaults can be found in either the directory the mirror executable is in or in the PERLLIB path, then it is loaded first. This is used to provide common defaults for all config-files.
Mirror was written to mirror remote Un*x archives, but has grown (like topsy).
You can add whitespace before the keyword and the equals/plus. Everything immediately following the equals/plus is the value, including any leading or trailing whitespace. The equals version sets the keyword to this value, while the plus version concatenates the value onto the end of the default.
A statement can be continued over multiple lines by ending all lines except the last, with the character ampersand ('&'). The line following the ampersand, is appended to the current line with all leading whitespace removed.
Here is a list of the keywords and their values with defaults given inside square brackets. Those options flagged with a star are not yet implemented.
Although there are a lot of keywords that can be set, the built-in defaults will handle most cases. Normally only package, site, remote_dir and local_dir need to be set.
Each group of keywords defines how to mirror a particular package and should begin with a unique package line. The package name is used in report generation and by the -p argument, so pick something mnemonic. The minimum needed for each package is the package, site, remote_dir and local_dir. On finding a package line, all the default values are reset.
If the package name is defaults, then no site is contacted, but the default values given for any keywords are changed. Normally all the defaults are in the file mirror.defaults which will be automatically loaded before any package details.
# Sample mirror.defaults package=defaults # The LOCAL hostname - if not the same as `hostname` returns # (I advertise the name src.doc.ic.ac.uk but the machine is # really puffin.doc.ic.ac.uk.) hostname=src.doc.ic.ac.uk # Keep all local_dirs relative to here local_dir=/public/ remote_password=ukuug-soft@doc.ic.ac.uk
If the package is not defaults, then mirror will perform the following steps. Unless an internal failure is detected, any error will cause the current package to be skipped and the next one tried.
If mirror is not already connected to the site, it will disconnect from any site it is already connected to and attempt to connect to the remote site's ftp daemon. It will then login using the given remote username and password. Once connected, mirror turns on binary mode transfers. Next it changes to the given local directory and scans it to get the details of the local files that already exist. If necessary, the local directory will be created. Once this is completed, the remote directory is scanned in a similar fashion. Mirror does this by changing to the remote directory and running the ftp LIST command, passing the flags_recursive options or flags_nonrecursive depending on the value of recursive. Alternatively a file containing the directory listing may be retrieved. Each remote pathname will have any specified mappings performed on it to create a local pathname. Then any checks specified by the exclude_patt, max_days, get_newer and get_size_change keywords are applied on names of files or symlinks. Only exclude_patt checking is applied to directories.
The above creates a list of all required remote files and the local pathnames to store them in.
Once the directory listing is completed, all required files are fetched from the remote site into their local path names. This is done by retrieving the file into a temporary file in the target directory. If required, the temporary file is compressed, gzip'ed or split (or compressed or gzip'ed and then split). The temporary file is renamed when the transfer is successful.
Mirror uses the remote directory listing to work out what files are available. Mirror was originally targetted connect to unix ftp daemons using a standard ls command. To use a unix host with a non-standard ls or a non unix host it is necessary to set the remote_fs variable to match the kind of directory listing that will be returned. There is some interaction between remote_fs and other variables in particular flags_recursive, recurse_hard and get_size_change. The following sub sections show examples of the results of running the ftp dir command on the various kinds of archive and recommendations for related variables. With some unusually setups archive you may have to vary from the recommended variable setups.
total 65 -rw-r--r-- 1 ukuug ukuug 2245 Jun 28 20:06 README -rw-r--r-- 1 ukuug ukuug 61949 Jun 29 19:13 mirror-2.1.tar.gz
This is the default and you should not normally have to reset any other variables.
00index.txt 189916 0readme 5793 1_x/ = OS/2 1.x-specific files
This is an ls variant used on some unix archives. It provides descriptions of known items in the listing. Set flags_recursive to -dtR.
- [R----F--] jrd 1646 May 07 21:43 index d [R----F--] jrd 512 Sep 09 10:52 netwire d [R----F--] jrd 512 Sep 02 01:31 pktdrvr d [RWCE-F--] jrd 512 Sep 04 10:55 incoming
This is used by Novell archives. Set recurse_hard to true and set flags_recursive to be nothing. See also remote_dir.
00-index.txt 6,471 13:54 7/20/93 alabama.txt 1,246 23:29 5/08/92 alaska.txt 873 23:29 5/08/92 alberta.txt 2,162 23:29 5/08/92
dosftp is for an ftp daemon on dos boxes. Set recurse_hard to true and set flags_recursive to nothing. See also remote_dir.
-------r-- 0 127 127 Aug 27 13:53 !Gopher Links drwxrwxr-x folder 32 Sep 9 16:30 FAQ drwxrwx-wx folder 0 Sep 9 09:59 incoming
macos is for one of Macintosh ftp daemon variants. Although the output is similar to unix it the unix remote_fs type cannot cope with it because there are three file sizes for each file. Setrecurse_hard to true, flags_recursive to nothing, get_size_change to false and compress_patt to nothing (this last setting is due to the unusual file names upsetting the shell used to run compress). See also remote_dir.
USERS:[ANONYMOUS.PUBLIC] 1-README.FIRST;13 9 14-JUN-1993 13:09 [ANONYMOUS] (RWE,RWE,RE,RE) PALTER.DIR;1 1 18-JAN-1993 11:56 [ANONYMOUS] (RWE,RWE,RE,RE) PRESS-RELEASES.DIR;1 1 11-AUG-1992 20:05 [ANONYMOUS] (RWE,RWE,,)
alternatively:
[VMSSERV.FILES]ALARM.DIR;1 1/3 5-MAR-1993 18:09 [VMSSERV.FILES]ALARM.TXT;1 1/3 4-FEB-1993 12:20
Set flags_recursive to '[...]' and get_size_change to false. recurse_hard is not available with vms. See also the vms_keep_versions and vms_xfer_text variables.
Here is the mirror.defaults file from the archive on src.doc.ic.ac.uk:
# This is the default mirror settings used by my site: # src.doc.ic.ac.uk (146.169.2.1) # This is home of the UKUUG Software Distribution Service package=defaults # The LOCAL hostname - if not the same as `hostname` # (I advertise the name src.doc.ic.ac.uk but the machine is # really puffin.doc.ic.ac.uk) hostname=src.doc.ic.ac.uk # Keep all local_dirs relative to here local_dir=/public/ remote_password=ukuug-soft@doc.ic.ac.uk mail_to= # Don't mirror file modes. Set all dirs/files to these dir_mode=0755 file_mode=0444 # By default, files are owned by root.zero user=0 group=0 # # Keep a log file in each updated directory # update_log=.mirror update_log= # Don't overwrite my mirror log with the remote one. # Don't retrieve any of their mirror temporary files. # Don't touch anything whose name begins with a space! # nor any FSP or gopher files... exclude_patt=(^|/)(.mirror$|.in..*.$|MIRROR.LOG|#.*#|.FSP|.cache|.zipped|lost+found/| ) # Try to compress everything compress_patt=. compress_prog=compress # Don't compress information files, files that don't benefit from # being compressed, files that tell ftpd, gopher, wais... to do things, # the sources for compression programs... # (Note this is the only regexp that is case insensitive.) compress_excl+|^.notar$|-z|.taz$|.tar.Z|.arc$|.zip$|.lzh$|.zoo$|.exe$|.lha$|.zom$|.gif$|.jpeg$|.jpg$|.mpeg$|.au$|read.*me|index|.message|info|faq|gzip|compress # Don't delete own mirror log or any .notar files (incl in subdirs) delete_excl=(^|/).(mirror|notar)$ # Ignore any local readme files local_ignore=README.doc.ic # Automatically delete local copies of files that the # remote site has zapped do_deletes=true
Here are some sample package descriptions:
package=gnu comment=Powerful and free Un*x utilities site=prep.ai.mit.edu remote_dir=/pub/gnu # Local_dir+ causes gnu to be appended to the default local_dir # so making /public/gnu local_dir+gnu exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^.history # I tend to only keep the latest couple of versions of things # this stops mirror from retrieving the older versions I've removed max_days=30 do_deletes=false package=X11R5 comment=X Windows (windowing graphics system for Un*x) site=export.lcs.mit.edu remote_dir=/pub/R5 local_dir+computing/graphics/systems/X11/pub/R5 # This is a local symlink to the free-for-all contrib area # and is mirrored elsewhere local_ignore=^contrib$ # Don't compress a thing. It is already compressed # but doesn't look it. compress_patt= package=cnews comment=The C News system site=ftp.cs.toronto.edu remote_dir=/pub/c-news local_dir+computing/usenet/software/transport/c compress_excl+|patches/PATCHDATES|WhereFrom # THIS IS JUST A TEST package=test vms site site=vmsbox.somewhere.ac.uk local_dir=/tmp/copy4 remote_dir=vmsserv/files remote_fs=vms # Must do these settings for VMS flags_recursive=[...] get_size_change=false # and on, and on ...
On adding a new package, first check it out by turning on the -n option.
If you are adding to an existing archive, then it is usually best to force the time-stamps of the existing local files so time comparisons with the remote files will work.
Try and keep all packages that are being retrieved from the same site together. That way mirror will only have to login once.
Remember that all regexp's are Perl regular expressions.
If the remote site contains symlinks that you want to "flatten out" into the corresponding files, then do this by changing the flags passed to the remote ls:
First test this by trying a ls -lRatL on the remote site under the ftp command to check whether the remote file-store has any symlink loops.
If you are mirroring a very large site that changes infrequently, add max_days=7 to the settings after it is initially mirrored. That way mirror will only have to consider recent files when updating. Then once a week, or whenever necessary, call mirror with -kmax_days=0 to force a full update.
If you don't want to compress anything from the remote site the easiest way to do this is to set the compress_patt to nothing.
If you want to run a command at the end of mirroring a package a useful trick is to reset the mail_prog variable to be the program name and mail_to to be the arguments.
For netware, dosftp, macos and VMS you should normally set remote_dir to be the home directory of the remote ftp daemon. Connect in manually and before changing directory use the pwd command to find where home is. If you are only mirroring part of the tree then give the full path name including this home directory at the start.
macos names can sometimes contain characters that make it hard to pass them through un*x shells. Since compressing files is done via a shell it would be best to turn off compression with compress_patt=
macos files seem to always change size when transfered, in either binary or text mode. So it would be best to set get_size_change=false
Here are, what I hope are, some good general rules:
Only mirror a site well outside the working hours of both the local and remote sites.
It is probably unfriendly to try to mirror a remote site more than once a day.
Before trying to mirror a remote site, try and find the packages you want from local archives, as no one will be pleased if you soak up a lot of network bandwidth needlessly.
If you have a local archive, then tell people about it so they don't have to waste bandwidth and CPU at the remote site.
Do remember to check your config-files from time to time in case the remote archive has changed their access restrictions.
Check the remote site regularly for any new restrictions.
Some of the netiquette guidelines should be enforced.
Should be able to cope with links as well as symlinks.
Suffers from creeping featurism.
Objects in a mirror are closer than you think!
Special thanks to the following people for patches, comments and other suggestions that have helped to improve mirror. If I have omitted anyone, please contact me.
James Revell <revell@uunet.uu.net>
Chris Myers <chris@wugate.wustl.edu>
Amos Shapira <amoss@cs.huji.ac.il>
Paul A Vixie <vixie@pa.dec.com>
Jonathan Kamens <jik@pit-manager.mit.edu>
Christian Andretzky <casys@otto.mb3.tu-chemnitz.de>
Kean Stump <kean@ucs.orst.edu>
Anita Eijs <anita@hermes.bouw.tno.nl>
Simon E Sperro <S.E.Sperro@gdr.bath.ac.uk>
Aaron Wohl <aw0g+@andrew.cmu.edu>
Michael Meissner <meissner@osf.org>
Michael Graff <explorer@iastate.edu>
Bradley Rhoades <us267388@mail.mmmg.com>
Edwards Reed <eer@cinops.xerox.com>
Joachim Schrod <schrod@iti.informatik.th-darmstadt.de>
David Woodgate <David.Woodgate@mel.dit.csiro.au>
Pieter Immelman <pi@itu1.sun.ac.za>
Jost Krieger <x920031@bus072.rz.ruhr-uni-bochum.de>