MIRROR

Section: Misc. Reference Manual Pages (1L)
Updated: Mirror 2.4
Index Return to Main Contents
 

NAME

mirror - mirror packages on remote sites  

SYNOPSIS

mirror [flags] -gsite:pathname
mirror [flags] [config-files]  

DESCRIPTION

Mirror is a package written in Perl that uses the ftp protocol to duplicate a directory hierarchy between the machine it is run on and a remote host. It avoids copying files unnecessarily by comparing the file time-stamps and sizes before transferring. Amongst other things, it can optionally compress, gzip, and split files.

It was written for use by archive maintainers but can be used by anyone wanting to transfer a lot of files via ftp.

Regardless of how it is called, mirror always performs the same basic steps. It connects to the remote site, internally builds a directory listing of the local target directory, builds one for the remote directory, compares them, creates any subdirectories required, transfers the appropriate files (setting their time-stamps to match those on the remote site), creates any symbolic links, removes any unnecessary objects and finally drops the connection.

Mirror can handle symbolic links but not ordinary links. It does not duplicate owner or group information. If you require any of these options, use rdist(1) instead.

Mirror is called in one of two ways shown in the synopsis above.

The first method is used to retrieve a remote directory into the current directory. If you are mirroring a directory, it would be best to end the pathname in a slash ('/') so the remote recursive listing is smaller or use the -r flag to suppress recursion (see -g below). The mirror.defaults file is not used.

In the second method given in the synopsis above, a minimal number of arguments are required and mirror is controlled by the settings read from the configuration files (or standard input). If a file named mirror.defaults can be found in either the directory the mirror executable is in or in the PERLLIB path, then it is loaded first. This is used to provide common defaults for all config-files.

Mirror was written to mirror remote Un*x archives, but has grown (like topsy).  

OPTIONS

-d
Enable debugging. If this argument is given more than once, the debugging level will increase. Currently the maximum useful level is four.
-ppackage
Only mirror the given package. This option may be given multiple times in which case all the given packages will be mirrored. Without this option, all packages will be mirrored. Package is a regexp matched against the package variable.
-Rpackage
Similar to -p but skips all packages until it reaches the given package. Useful for restarting failed mirror runs from where they left off.
-n
Do nothing except compare local and remote directories, no file transfers are done. Sets debug level to two, so you are shown a trace of what would be done.
-F
Use temporary dbm files for the information about files. This is useful if you mirror a very large directory. See the variable use_files.
-gsite:path
Get all files on given site. If path matches   .*/.+   then it is the name of the directory and the last part is the pattern of filenames to get. If path matches   .*/   then it is the name of a directory and all its contents are retrieved. Otherwise path is the pattern to be used in '/'. If you use host:/fred, a full directory listing of / on the remote host will be done. If all you wanted was the contents of the directory /fred, specify host:/fred/.
-r
Equivalent to -krecursive=false
-v
Print the version details of mirror and exit.
-T
Force the time stamps of any local files to be reset to be the same as the remote files. Normally only used when initialising a mirror area using existing files contents.
-U[filename]
Record all uploads into the given filename. Remember that mirror changes into local_dir to do its work, so it should be a full pathname. If no parameter is given, it defaults to `pwd`/upload_log.day.month.year.
-kkey=value
Override any default key/value.
-m
Equivalent to -kmode_copy=true
-t
Equivalent to -ktext_mode=true
-f
Equivalent to -kforce=true
-ssite
Equivalent to -ksite=site
-uuser
Equivalent to -kremote_user=user You are then prompted for a password, with echo turned off. The password is used to set remote_password.
-L
Just generate a pretty printed version of the input and exit.
-G
Get files from the remote machine. The local and remote directories have to be given on the command line. (This option is no longer supported.)
-P
Put files onto the remote machine. The local and remote directories have to be given on the command line. (This option is no longer supported.)
-Cfile
Specify config-files. Needed to give config-files with -P and -G options. (This option is no longer supported.)

 

CONFIGURATION FILE

Configuration files are parsed as a series of statements. Blank lines and lines beginning with a hash are ignored. Each statement is of the form
keyword=value or
keyword+value

You can add whitespace before the keyword and the equals/plus. Everything immediately following the equals/plus is the value, including any leading or trailing whitespace. The equals version sets the keyword to this value, while the plus version concatenates the value onto the end of the default.

A statement can be continued over multiple lines by ending all lines except the last, with the character ampersand ('&'). The line following the ampersand, is appended to the current line with all leading whitespace removed.

Here is a list of the keywords and their values with defaults given inside square brackets. Those options flagged with a star are not yet implemented.

Although there are a lot of keywords that can be set, the built-in defaults will handle most cases. Normally only package, site, remote_dir and local_dir need to be set.

package
Should be a unique name for the package to be mirrored. ['']
comment
Used in reports. ['']
skip
Setting this entry causes this package to be skipped. The value is reported as the reason for skipping. (It is easier than commenting the entry out.) ['']
site
Site-name or IP address of the remote site. ['']
remote_dir
Remote directory to mirror. See also recurse_hard. ['']
local_dir
Local directory. ['']
remote_user
Username to use at remote site. [anonymous]
remote_password
Password to use at remote site. [user@localhostname]
get_patt
Regexp of remote pathnames to retrieve. [.]
exclude_patt
Regexp of remote pathnames to ignore. ['']
update_local
Set get_patt to be local_dir/*. This is useful if you only want to mirror selected subdirectories of a remote archive. [false]
local_ignore
Regexp of local pathnames to ignore. Useful to skip restricted local directories. ['']
do_deletes
Delete destination files if not in source tree. [false]
delete_patt
Regexp of local pathnames to check for deletions. Names that are not matched are not checked. The match by delete_excl is done to all files selected by this pattern. [.]
delete_get_patt
Set delete_patt to be get_patt. [false]
delete_excl
Regexp of local pathnames to never delete. ['']
save_deletes
Save local files into save_dir rather than deleting. [false]
save_dir
Where local files no longer on remote site are transferred to. [Old]
max_delete_files
If there are more than this many files to delete, do not delete, just warn. If the value ends with percent character the this is the percentage of files before deletion disabled. [10%]
max_delete_dirs
If there are more than this many directories to delete, do not delete, just warn. If the value ends with percent character the this is the percentage of directories before deletion disabled. [10%]
max_days
If >0, ignore files older than this many days. Any ignored files will not be transferred or deleted. [0]
split_max
If >0 and the size of the file is greater than this, the file is split up to be stored locally (filename must also match split_patt). [0]
split_patt
regexp of remote pathnames to split up before storing locally. ['']
split_chunk
Size of chunks to split files into. [102400]
ls_lR_file
Remote file containing ls-lR, otherwise run remote ls. ['']
local_ls_lR_file
Local file containing ls-lR, otherwise use remote ls_lR_file. This is useful when first mirroring a large package. ['']
recursive
Do subdirectories as well. [true]
recurse_hard
Have to generate remote ls by doing cwd and ls for each subdirectory. In this case remote_dir must be absolute (begin with a /) not relative. Use the pwd command in ftp to find the path for the start of the remote archive area. (Not available if remote_fs is vms.) [false]
flags_recursive
Flags to send to remote ls to do a recursive listing. ['-lRat']
flags_nonrecursive
Flags to send to remote ls to do a non-recursive listing. ['-lat']
remote_fs
Remote file store type. Handles unix, dls, netware, vms, dosftp, macos, lsparse and infomac. See the FILESTORES section below for more details. [unix]
vms_keep_versions
When mirroring vms files, keep the version numbers. If false, the versions are stripped off and the only the base filenames are kept. [true]
vms_xfer_text
Pattern of vms files to transfer in TEXT mode (case insensitive). ['readme$|info$|listing$|.c$']
name_mappings
Remote to local pathname mappings (a perl s command, eg. s:old:new:). ['']
external_mapping
External routine to perform name mappings. ['']
get_newer
Get the remote file if its date is newer than local. [true]
get_size_change
Get the file if the size is different from local. If a file is compressed when fetched, the size is automatically ignored. [true]
compress_patt
Regexp of files to compress before storing locally. See get_size_change. ['']
compress_excl
Regexp of files not to compress (case insensitive). [\.(z|gz)$]
compress_prog
Program to compress files. If set to the word compress or gzip, the full pathname and correct compress_suffix will automatically be set. When using gzip, level -9 is used. Note that compress_suffix can be reset to a non-standard value by setting it after compress_prog. [compress]
compress_suffix
Character(s) the compress program appends to files. If compress_prog is compress, this defaults to .Z. If compress_prog is gzip, this defaults to .gz. ['']
compress_conv_patt
If compress_prog is gzip, files matching this pattern are uncompressed and gzip'ed before storing locally. Compression conversion is only meant to do compress to gzip conversion. [(\.Z|\.taz)$]
compress_conv_expr
Perl expression to convert suffix from compress to gzip style. [s/\.Z$/\.gz/;s/\.taz$/\.tgz/]
compress_size_floor
Only compress files smaller than this size. [0]
force_times
Force local times to match remote times. [yes]
retry_call
If initial connect fails, retry ONCE after ONE minute. This is to handle sites which reverse lookup the incoming host but sometimes timeout on the first attempt. [yes]
update_log
Filename, relative to local_dir, where an update report is to be kept. ['']
mail_to
Mail a log of the work done to this comma separated list of people. ['']
user
User name or uid to give to local pathnames. ['']
group
Group name or gid to give to local pathnames. ['']
file_mode
Mode to give files created locally. [0444]
dir_mode
Mode to give directories created locally. [0755]
timeout
Timeout ftp requests after this many seconds. [40]
ftp_port
Port number of remote ftp daemon. [21]
proxy
Set to 1 to use proxy ftp service. [0]
proxy_ftp_port
Port number of proxy-service ftp daemon. [4514]
proxy_gateway
Name of proxy-service, may also be supplied by the environment variable INTERNET_HOST. [internet-gateway]
mode_copy
Flag indicating if we need to copy the mode bits. [false]
interactive
A non-batch transfer. Implied by -g flag. [false]
text_mode
If true, files are transferred in text mode. Un*x prefers binary so that is the default. [false]
force
If true, all files will be transferred regardless of size or time-stamp. [false]
get_file
Perform get, not put by default. [true]
verbose
Verbose messages. [false]
delete_source
Delete the source files and directories once transferred. (This is no longer supported.) [false]
disconnect
Disconnect from remote site at end of package. [false]
mail_prog
Program called to send to the mail_to list. May be passed the argument mail_subject. Defaults to mailx, Mail, mail or whatever is available on your system. ['']
mail_subject
mirror update' ['-s]
hostname
Mirror automatically skips packages whose site variable matches this host. Defaults to the local hostname. ['']
use_files
Put the associative arrays that mirror uses into tmp files. [false]
umask
Do not allow setuid things across by default. [07000]
use_timelocal
Time-stamp files to local time zone. If false, the time zone is set to offset 0 (compatible with older versions of mirror). [true]
remote_group
If present set the remote 'site group'. ['']
remote_gpass
If present set the remote 'site gpass'. ['']
remote_idle
If not null, try and set the remote idle timer to this. ['']
make_bad_symlinks
If true, symlinks will be made to invalid (non-existent) pathnames. Under older versions this defaulted to true. [false]
follow_local_symlinks
Regexp of pathnames that should be followed to the file or directory they point at. This makes local symlinks invisible to mirror. ['']
get_missing
Really get files. When set to false, only deletions and symlinking will be done. Used to delete expired files older than max_days without retrieving older files. [true]

Each group of keywords defines how to mirror a particular package and should begin with a unique package line. The package name is used in report generation and by the -p argument, so pick something mnemonic. The minimum needed for each package is the package, site, remote_dir and local_dir. On finding a package line, all the default values are reset.

If the package name is defaults, then no site is contacted, but the default values given for any keywords are changed. Normally all the defaults are in the file mirror.defaults which will be automatically loaded before any package details.

# Sample mirror.defaults
package=defaults
        # The LOCAL hostname - if not the same as `hostname` returns
        # (I advertise the name src.doc.ic.ac.uk but the machine is
        #  really puffin.doc.ic.ac.uk.)
        hostname=src.doc.ic.ac.uk
        # Keep all local_dirs relative to here
        local_dir=/public/
        remote_password=ukuug-soft@doc.ic.ac.uk

If the package is not defaults, then mirror will perform the following steps. Unless an internal failure is detected, any error will cause the current package to be skipped and the next one tried.

If mirror is not already connected to the site, it will disconnect from any site it is already connected to and attempt to connect to the remote site's ftp daemon. It will then login using the given remote username and password. Once connected, mirror turns on binary mode transfers. Next it changes to the given local directory and scans it to get the details of the local files that already exist. If necessary, the local directory will be created. Once this is completed, the remote directory is scanned in a similar fashion. Mirror does this by changing to the remote directory and running the ftp LIST command, passing the flags_recursive options or flags_nonrecursive depending on the value of recursive. Alternatively a file containing the directory listing may be retrieved. Each remote pathname will have any specified mappings performed on it to create a local pathname. Then any checks specified by the exclude_patt, max_days, get_newer and get_size_change keywords are applied on names of files or symlinks. Only exclude_patt checking is applied to directories.

The above creates a list of all required remote files and the local pathnames to store them in.

Once the directory listing is completed, all required files are fetched from the remote site into their local path names. This is done by retrieving the file into a temporary file in the target directory. If required, the temporary file is compressed, gzip'ed or split (or compressed or gzip'ed and then split). The temporary file is renamed when the transfer is successful.  

FILESTORES

Mirror uses the remote directory listing to work out what files are available. Mirror was originally targetted connect to unix ftp daemons using a standard ls command. To use a unix host with a non-standard ls or a non unix host it is necessary to set the remote_fs variable to match the kind of directory listing that will be returned. There is some interaction between remote_fs and other variables in particular flags_recursive, recurse_hard and get_size_change. The following sub sections show examples of the results of running the ftp dir command on the various kinds of archive and recommendations for related variables. With some unusually setups archive you may have to vary from the recommended variable setups.  

remote_fs=unix

total 65
-rw-r--r-- 1 ukuug  ukuug   2245 Jun 28 20:06 README
-rw-r--r-- 1 ukuug  ukuug  61949 Jun 29 19:13 mirror-2.1.tar.gz

This is the default and you should not normally have to reset any other variables.  

remote_fs=dls

00index.txt      189916  
0readme            5793  
1_x/                  =  OS/2 1.x-specific files

This is an ls variant used on some unix archives. It provides descriptions of known items in the listing. Set flags_recursive to -dtR.  

remote_fsremote_fs=netware

- [R----F--] jrd                  1646       May 07 21:43    index
d [R----F--] jrd                   512       Sep 09 10:52    netwire
d [R----F--] jrd                   512       Sep 02 01:31    pktdrvr
d [RWCE-F--] jrd                   512       Sep 04 10:55    incoming

This is used by Novell archives. Set recurse_hard to true and set flags_recursive to be nothing. See also remote_dir.  

dosftp

00-index.txt  6,471 13:54  7/20/93   alabama.txt   1,246 23:29  5/08/92
alaska.txt      873 23:29  5/08/92   alberta.txt   2,162 23:29  5/08/92

dosftp is for an ftp daemon on dos boxes. Set recurse_hard to true and set flags_recursive to nothing. See also remote_dir.  

remote_fs=macos

-------r--      0      127   127 Aug 27 13:53 !Gopher Links
drwxrwxr-x          folder    32 Sep  9 16:30 FAQ
drwxrwx-wx          folder     0 Sep  9 09:59 incoming

macos is for one of Macintosh ftp daemon variants. Although the output is similar to unix it the unix remote_fs type cannot cope with it because there are three file sizes for each file. Setrecurse_hard to true, flags_recursive to nothing, get_size_change to false and compress_patt to nothing (this last setting is due to the unusual file names upsetting the shell used to run compress). See also remote_dir.  

remote_fs=vms

USERS:[ANONYMOUS.PUBLIC]

1-README.FIRST;13     9  14-JUN-1993 13:09 [ANONYMOUS] (RWE,RWE,RE,RE)
PALTER.DIR;1          1  18-JAN-1993 11:56 [ANONYMOUS] (RWE,RWE,RE,RE)
PRESS-RELEASES.DIR;1
                      1  11-AUG-1992 20:05 [ANONYMOUS] (RWE,RWE,,)


alternatively:

[VMSSERV.FILES]ALARM.DIR;1      1/3          5-MAR-1993 18:09
[VMSSERV.FILES]ALARM.TXT;1      1/3          4-FEB-1993 12:20

Set flags_recursive to '[...]' and get_size_change to false. recurse_hard is not available with vms. See also the vms_keep_versions and vms_xfer_text variables.

 

remote_fs=infomac

This is a special just meant to handle the sumex-aim.stanford.edu info-mac directory listing stored on that archive in help/all-files. recurse_hard should be set to true.  

remote_fs=lsparse

Allow reparsing of the listing generated by mirror with debugging turned to a high level. Meant only for mirror wizards.  

EXAMPLES

Here is the mirror.defaults file from the archive on src.doc.ic.ac.uk:

# This is the default mirror settings used by my site:
# src.doc.ic.ac.uk (146.169.2.1)
# This is home of the UKUUG Software Distribution Service

package=defaults
        # The LOCAL hostname - if not the same as `hostname`
        # (I advertise the name src.doc.ic.ac.uk but the machine is
        #  really puffin.doc.ic.ac.uk)
        hostname=src.doc.ic.ac.uk
        # Keep all local_dirs relative to here
        local_dir=/public/
        remote_password=ukuug-soft@doc.ic.ac.uk
        mail_to=
        # Don't mirror file modes.  Set all dirs/files to these
        dir_mode=0755
        file_mode=0444
        # By default, files are owned by root.zero
        user=0
        group=0
#       # Keep a log file in each updated directory
#       update_log=.mirror
        update_log=
        # Don't overwrite my mirror log with the remote one.
        # Don't retrieve any of their mirror temporary files.
        # Don't touch anything whose name begins with a space!
        # nor any FSP or gopher files...
        exclude_patt=(^|/)(.mirror$|.in..*.$|MIRROR.LOG|#.*#|.FSP|.cache|.zipped|lost+found/| )
        # Try to compress everything
        compress_patt=.
        compress_prog=compress
        # Don't compress information files, files that don't benefit from
        # being compressed, files that tell ftpd, gopher, wais... to do things,
        # the sources for compression programs...
        # (Note this is the only regexp that is case insensitive.)
        compress_excl+|^.notar$|-z|.taz$|.tar.Z|.arc$|.zip$|.lzh$|.zoo$|.exe$|.lha$|.zom$|.gif$|.jpeg$|.jpg$|.mpeg$|.au$|read.*me|index|.message|info|faq|gzip|compress
        # Don't delete own mirror log or any .notar files (incl in subdirs)
        delete_excl=(^|/).(mirror|notar)$
        # Ignore any local readme files
        local_ignore=README.doc.ic
        # Automatically delete local copies of files that the
        # remote site has zapped
        do_deletes=true

Here are some sample package descriptions:

package=gnu
        comment=Powerful and free Un*x utilities
        site=prep.ai.mit.edu
        remote_dir=/pub/gnu
        # Local_dir+ causes gnu to be appended to the default local_dir
        # so making /public/gnu
        local_dir+gnu
        exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^.history
        # I tend to only keep the latest couple of versions of things
        # this stops mirror from retrieving the older versions I've removed
        max_days=30
        do_deletes=false

package=X11R5
        comment=X Windows (windowing graphics system for Un*x)
        site=export.lcs.mit.edu
        remote_dir=/pub/R5
        local_dir+computing/graphics/systems/X11/pub/R5
        # This is a local symlink to the free-for-all contrib area
        # and is mirrored elsewhere
        local_ignore=^contrib$
        # Don't compress a thing.  It is already compressed 
        # but doesn't look it.
        compress_patt=

package=cnews
        comment=The C News system
        site=ftp.cs.toronto.edu
        remote_dir=/pub/c-news
        local_dir+computing/usenet/software/transport/c
        compress_excl+|patches/PATCHDATES|WhereFrom

# THIS IS JUST A TEST
package=test vms site
        site=vmsbox.somewhere.ac.uk
        local_dir=/tmp/copy4
        remote_dir=vmsserv/files
        remote_fs=vms
        # Must do these settings for VMS
        flags_recursive=[...]
        get_size_change=false

# and on, and on ...

 

HINTS

On adding a new package, first check it out by turning on the -n option.

If you are adding to an existing archive, then it is usually best to force the time-stamps of the existing local files so time comparisons with the remote files will work.

Try and keep all packages that are being retrieved from the same site together. That way mirror will only have to login once.

Remember that all regexp's are Perl regular expressions.

If the remote site contains symlinks that you want to "flatten out" into the corresponding files, then do this by changing the flags passed to the remote ls:

flags_recursive+L or
flags_nonrecursive+L

First test this by trying a ls -lRatL on the remote site under the ftp command to check whether the remote file-store has any symlink loops.

If you are mirroring a very large site that changes infrequently, add max_days=7 to the settings after it is initially mirrored. That way mirror will only have to consider recent files when updating. Then once a week, or whenever necessary, call mirror with -kmax_days=0 to force a full update.

If you don't want to compress anything from the remote site the easiest way to do this is to set the compress_patt to nothing.

If you want to run a command at the end of mirroring a package a useful trick is to reset the mail_prog variable to be the program name and mail_to to be the arguments.

For netware, dosftp, macos and VMS you should normally set remote_dir to be the home directory of the remote ftp daemon. Connect in manually and before changing directory use the pwd command to find where home is. If you are only mirroring part of the tree then give the full path name including this home directory at the start.

macos names can sometimes contain characters that make it hard to pass them through un*x shells. Since compressing files is done via a shell it would be best to turn off compression with compress_patt=

macos files seem to always change size when transfered, in either binary or text mode. So it would be best to set get_size_change=false  

NETIQUETTE

If you are going to mirror a remote site, please obey any restrictions that the site administrators place on access. You can generally find the restrictions on connecting to the archive using the standard ftp command. Any restrictions are normally given as a login banner or in a (hopefully) obvious file.

Here are, what I hope are, some good general rules:

Only mirror a site well outside the working hours of both the local and remote sites.

It is probably unfriendly to try to mirror a remote site more than once a day.

Before trying to mirror a remote site, try and find the packages you want from local archives, as no one will be pleased if you soak up a lot of network bandwidth needlessly.

If you have a local archive, then tell people about it so they don't have to waste bandwidth and CPU at the remote site.

Do remember to check your config-files from time to time in case the remote archive has changed their access restrictions.

Check the remote site regularly for any new restrictions.  

SEE ALSO

perl(l), ftp(1), mm(1)  

BUGS

Some of the netiquette guidelines should be enforced.

Should be able to cope with links as well as symlinks.

Suffers from creeping featurism.  

REMEMBER

Objects in a mirror are closer than you think!  

AUTHOR

Written by Lee McLoughlin <lmjm@doc.ic.ac.uk>. It uses an extended version of the ftp.pl package originally by: Alan R. Martello <al@ee.pitt.edu> which uses the chat2.pl package by: Randal L. Schwartz <merlyn@ora.com>

Special thanks to the following people for patches, comments and other suggestions that have helped to improve mirror. If I have omitted anyone, please contact me.

James Revell <revell@uunet.uu.net>
Chris Myers <chris@wugate.wustl.edu>
Amos Shapira <amoss@cs.huji.ac.il>
Paul A Vixie <vixie@pa.dec.com>
Jonathan Kamens <jik@pit-manager.mit.edu>
Christian Andretzky <casys@otto.mb3.tu-chemnitz.de>
Kean Stump <kean@ucs.orst.edu>
Anita Eijs <anita@hermes.bouw.tno.nl>
Simon E Sperro <S.E.Sperro@gdr.bath.ac.uk>
Aaron Wohl <aw0g+@andrew.cmu.edu>
Michael Meissner <meissner@osf.org>
Michael Graff <explorer@iastate.edu>
Bradley Rhoades <us267388@mail.mmmg.com>
Edwards Reed <eer@cinops.xerox.com>
Joachim Schrod <schrod@iti.informatik.th-darmstadt.de>
David Woodgate <David.Woodgate@mel.dit.csiro.au>
Pieter Immelman <pi@itu1.sun.ac.za>
Jost Krieger <x920031@bus072.rz.ruhr-uni-bochum.de>


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
CONFIGURATION FILE
FILESTORES
remote_fs=unix
remote_fs=dls
remote_fsremote_fs=netware
dosftp
remote_fs=macos
remote_fs=vms
remote_fs=infomac
remote_fs=lsparse
EXAMPLES
HINTS
NETIQUETTE
SEE ALSO
BUGS
REMEMBER
AUTHOR

This document was created by man2html, using the manual pages.
Time: 20:10:39 GMT, September 18, 2024