spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: sync master with slaves with bittorrent?
Date Mon, 19 May 2014 06:07:49 GMT
My first thought would be to use libtorrent for this setup, and it turns
out that both Twitter and Facebook do code deploys with a bittorrent setup.
 Twitter even released their code as open source:

https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent

http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/


On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmahler@gmail.com> wrote:

> I am not an expert in this space either. I thought the initial rsync
> during launch is really just a straight copy that did not need the tree
> diff. So it seemed like having the slaves do the copying among it each
> other would be better than having the master copy to everyone directly.
> That made me think of bittorrent, though there may well be other systems
> that do this.
> From the launches I did today it seems that it is taking around 1 minute
> per slave to launch a cluster, which can be a problem for clusters with 10s
> or 100s of slaves, particularly since on ec2  that time has to be paid for.
>
>
> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilikerps@gmail.com>wrote:
>
>> Out of curiosity, do you have a library in mind that would make it easy
>> to setup a bit torrent network and distribute files in an rsync (i.e.,
>> apply a diff to a tree, ideally) fashion? I'm not familiar with this space,
>> but we do want to minimize the complexity of our standard ec2 launch
>> scripts to reduce the chance of something breaking.
>>
>>
>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <dmahler@gmail.com> wrote:
>>
>>> I am launching a rather large cluster on ec2.
>>> It seems like the launch is taking forever on
>>> ....
>>> Setting up spark
>>> RSYNC'ing /root/spark to slaves...
>>> ...
>>>
>>> It seems that bittorrent might be a faster way to replicate
>>> the sizeable spark directory to the slaves
>>> particularly if there is a lot of not very powerful slaves.
>>>
>>> Just a thought ...
>>>
>>> cheers
>>> Daniel
>>>
>>>
>>
>

Mime
View raw message