Tech@TheAtomicMoose.ca: How To Copy Files Across A Network

Subjects

How To Copy Files Across A Network

How To Copy Files Across A Network

There are several ways to copy files across a network using Unix machines. This document aims to outline how to use some of the more common methods, and how to use them effectively.

The Methods

This document talks about the following methods of file copying:

I assume the use of ssh(1) in all situations. Anything else just isn't of much use these days. Hopefully, all these methods will be more secure than current practices involving ftp(1) or rcp(1).

scp

scp is probably the easiest of all of the three methods. It was designed as a replacement for rcp(1), which was a one night hack to be a networked version of cp(1). So it uses a fairly easy, familiar syntax:

scp [-Cr] /some/file [ more ... ] host.name:/destination/file

Before it does any copying, scp will use the normal ssh authentication to connect. Normally, this consists of asking your for your password or passphrase. If you are having trouble getting scp working, try connecting with ssh -v first, If that works, scp should, too.

By specifying a -r flag, you can specify that the source file can be a directory, and if so to copy it recursively. This means you can copy large trees to another computer easily.

scp will encrypt your file as it gets copied across, but by specifying the -C option, you can ask scp to compress your data automatically as it travels. This is especially good with things like large text files (including XML and HTML), as they compress very well. Doing this can save a significant amount of time on a large copy.

One final tip. By default, scp uses the 3DES encryption algorithm. All encryption algorithms are slow, but you can get a slight speed up by specifying a different one. Try adding "-c blowfish" to your command line to see if it speeds up some.

As useful as it is, there are some things that you shouldn't use scp for.

  1. Mostly, this is where you are copying more than a few files at a time. scp spawns a new process at the remote end for each file you are copying, so it can be quite slow.
  2. When using the -r flag, you must be careful. It does not understand symbolic links. If you have a symbolic link, scp will blindly follow it, even if it points to a directory that's already been copied. This can lead to scp copying an infinite amount of data, or at the very least, one full disks worth. Be careful!

rsync

rsync again has a very similiar syntax to rcp:

rsync -e ssh [-avz] /some/file [ more ... ] host.name:/destination/file

rsync's speciality is copying large files or collections of files that have small changes made to them. It calculates the differences between files and only transfers the parts that have changed. This can lead to enourmous improvements in speed when copying a directory tree for a 2nd time.

The flags are:

-a
Archive mode. This should probably always be on. It asks rsync to attempt to preserve permissions timestamps, ownerships and so on. It also doesn't follow symlinks.
-v
Verbose mode. List the files that are being transferred.
-z
Enable compression. This will compress each file as it gets sent over the pipe. Depending upon the data you are copying, this can be a big win.
-e ssh
Use ssh as the transport. You should always specify this. If you get tired of typing it in each time, you can type in this command to set the default for rsync: export RSYNC_RSH=ssh

What disadvantages does rsync have? Not that many...

  1. It does have picky syntax though. In particular, the use of trailing slashes on source directories can imply different meanings as to how that directory is copied, which can be confusing.
  2. You have to remember to specify that you're using ssh.
  3. rsync isn't installed everywhere.

tar

tar is normally an archiving program for backups. But with the use of ssh, it can be coerced into copying large directory trees with ease. It has the advantage that it copies things correctly, even ACLs on those Unixes which have them, as well as being perfectly comfortable with symlinks.

The syntax, however, is slightly baroque:

tar -cf - /some/file | ssh host.name tar -xf - -C /destination

Whilst it looks complex, at heart it's quite simple: create a tar archive to stdout, send it across the network to another tar on the remote machine for unpacking.

The arguments to the first tar are -c to create a new archive and -f -, which tells tar to send the newly created archive to stdout.

The arguments to the second tar command are the reverse: -x to extract the archive and -f - to take the archive from stdin. The final -C /destination tells tar to change into the /destination directory before extracting anything.

Why should you use this method when the other two are available? For initial copying of large directory trees, this method can be very quick, because it streams. The first tar will send it's output as soon as it has found the first file in the source directory tree, and that will be extracted almost immediately afterwards by the 2nd tar. Meanwhile, the first tar is still finding and transmitting files. This pipeline works very well.

As with the other two methods, you can ask for compression of the data stream if your source data is amenable to it. Here, you have to add a -z flag to each tar:

tar -czf - /some/file | ssh host.name tar -xzf - -C /destination

In a similiar fashion, you can enable verbose mode by passing a -v flag to the 2nd tar. Don't pass it to the first one as well, or you'll get doubled output!

Why shouldn't you use this method?

  1. The syntax is a pain to remember.
  2. It's not as quick to type as the scp command, for small amounts of files.
  3. rsync will beat it hands down for a tree of files that already exists on the destination. original url - http://www.happygiraffe.net/copy-net
    Back