spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <>
Subject Re: sync master with slaves with bittorrent?
Date Mon, 19 May 2014 06:07:49 GMT
My first thought would be to use libtorrent for this setup, and it turns
out that both Twitter and Facebook do code deploys with a bittorrent setup.
 Twitter even released their code as open source:

On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <> wrote:

> I am not an expert in this space either. I thought the initial rsync
> during launch is really just a straight copy that did not need the tree
> diff. So it seemed like having the slaves do the copying among it each
> other would be better than having the master copy to everyone directly.
> That made me think of bittorrent, though there may well be other systems
> that do this.
> From the launches I did today it seems that it is taking around 1 minute
> per slave to launch a cluster, which can be a problem for clusters with 10s
> or 100s of slaves, particularly since on ec2  that time has to be paid for.
> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <>wrote:
>> Out of curiosity, do you have a library in mind that would make it easy
>> to setup a bit torrent network and distribute files in an rsync (i.e.,
>> apply a diff to a tree, ideally) fashion? I'm not familiar with this space,
>> but we do want to minimize the complexity of our standard ec2 launch
>> scripts to reduce the chance of something breaking.
>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <> wrote:
>>> I am launching a rather large cluster on ec2.
>>> It seems like the launch is taking forever on
>>> ....
>>> Setting up spark
>>> RSYNC'ing /root/spark to slaves...
>>> ...
>>> It seems that bittorrent might be a faster way to replicate
>>> the sizeable spark directory to the slaves
>>> particularly if there is a lot of not very powerful slaves.
>>> Just a thought ...
>>> cheers
>>> Daniel

View raw message