There’s a rather well known project called murder that was written originally by some folks at Twitter. It uses Bittorent to push files to a large amount of production servers. There’s a video explaining it with that oh-so-twitter baby blue color everywhere that you should totally watch if you want more info.
The one downside is that its pretty beastly to implement if you’re not already using Capistrano. Our particular need was to integrate Murder into another on-demand system that has to push some files around to worker bees. The files were pretty large, so bittorrent was the natural “speed ‘em up” perscription.
There’s a great little project called Herd that simplifies Murder to a simple python script. Which I have to say, is pretty perfect. After making a few little patches to bubble up errors and provide better help text (committed back to source of course), it’s much nicer to use. It’s a replacement for Murder that just uses the same design and logic.
After installing eventlet and argparse modules (probably already there for you), you have to do 2 things: 1) Make sure you can ssh without passwords to every server, and 2) put those lists of servers (one per line) into a file. The examples all call it hosts.dat, and looks like this:
Then you just run this command:
When we were testing this at scale, we found that even using bittorent, pushing 2TB files to a few hundred nodes caused some interesting network saturation problems. That caused the bittorent client to bail and the files wouldn’t get pushed. We did find though, that just retrying a few times was all that was needed. So I tossed in a --retry
flag to accomplish just that. If it detects a failure from bittorent (like connection failure, md5sum mismatches, etc), it will try X more times before dieing forever and sending the errors to your --log-dir
(one file per host). That makes error remediation much easier at large scale.
So yeah, I can now say we are sucessfully using Herd to transfer petabytes of data a day, and it was much easier to drop in than embracing the full blown Murder/Capistrano stack.