Nathan Hoad

Script for uploading files via SSH

August 14, 2011

One of my projects at university requires that I upload 600-1200 files to an external development server every few days. Originally, I did this by hand, using Filezilla. Filezilla for some reason doesn’t work well with me with the particular server(s) I’m using, so out of 600 files, about 30-40 would fail, repeatedly. Then I had issues where Filezilla would skip files, trying to be smart about modification dates. That would be fine and dandy, if my server wasn’t set up for a completely different timezone to mine (on the order of being 15 hours behind me).

I’d been doing this for months, and it finally hit breaking point on Monday when a demo to the client failed miserably. The new functionality wasn’t visible, and the old functionality was completely broken. To say the least, it was embarrassing. Later, in a fury, I investigated what went wrong. Lo and behold, Filezilla had silently failed to upload about 30-40 files. The failures weren’t too bad before, because at least Filezilla told you about them. But silent failures? No way. So I propose this wonderful script:

#!/usr/bin/python

import argparse
import os
import tarfile
import tempfile

parser = argparse.ArgumentParser(
    description='Upload a directory to a server via ssh.')
parser.add_argument('--version', action='version', version='%(prog)s 1.0')
parser.add_argument('-v', '--verbose', action='store_true', default=False,
    dest='verbose', help='Give verbose output')
parser.add_argument('-s', '--source', action='store', default=os.getcwd(),
    dest='source', help='Source directory to archive')
parser.add_argument('-d', '--destination', action='store', dest='destination',
    help='Destination for the archive on the remote machine', required=True)
parser.add_argument('-c', '--connection-string', action='store', 
    dest='connection', required=True,
    help=('Connection string to use for connecting to the remote machine. '
          'E.g. "user@domain.com"'))
parser.add_argument('-i', '--ignore', action='store', dest='ignore', 
    help='Custom filenames to ignore')

args = parser.parse_args()
cwd = os.getcwd()
verbose = args.verbose
host = args.connection
destination = args.destination

if verbose:
    print("Chdir'ing to {}".format(args.source))

os.chdir(args.source)

bad_files = ['.svn', '.hg', '.git']
bad_files.extend(args.ignore.split(','))

def file_filter(name):
    for bad in bad_files:
        if bad in name:
            return True

    return False

filename = ''

if verbose:
    print('Creating temporary file...')

with tempfile.NamedTemporaryFile(delete=False, suffix='.tar') as _fileobj:
    filename = _fileobj.name

if verbose:
    print('Archiving directory...')

with tarfile.open(filename, mode='w:gz') as tar:
    for dirpath, dirnames, filenames in os.walk(os.getcwd()):
        for f in filenames:
            # we don't want the real path, just relative.
            directory = dirpath[len(cwd)+1:]
            tar.add(os.path.join(directory, f), exclude=file_filter)

size = os.path.getsize(filename)

if verbose:
    print('Uploading {} bytes with {}...'.format(size, host))

execute_string = 'cat {} | ssh {} tar xz -C {}'.format(filename, 
    host, destination)

if verbose:
    print('executing {}'.format(execute_string))

os.system(execute_string)

if verbose:
    print('Removing temporary file...')

os.remove(filename)

Using Python, tar and ssh, it uploads a gzipped version of the current folder, ignoring Svn, Mercurial and Git repositories. It smashes 1200 files (about 7mb) into 900kb, uploads it and uncompresses on the other end automatically.

This has a few benefits for me, namely:

I hope this is useful to anyone that has to do a lot of tedious reuploading of the same thing! If you think that using Python was silly, and that I could have written this with a shell script, then I agree. But I needed powerful filtering, and I wasn’t going to muck around with Bash for a script like this.

The fact that I’m using os.system feels bad, but let’s look at the alternative;

Yeah, that’s okay, but hey, I’d rather depend on the intergrity of my operating system than a heap of third party libraries. I’ve never used a system that didn’t have SSH and cat. Adding in the SSH wrapper is a needless dependency.

Final, completely unrelated note; when I started learning Python, I learnt getopt for my command line argument parsing. Boy, what a crappy decision. I learnt argparse for this script, and it is awesome. Generated usage statements? Hell yes. Automatic error handling? Double hell yes.

**Update: **I’ve uploaded the above script and another useful script, ss, to Google code as networktools.