spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
Subject Re: Downloading Hadoop from s3://spark-related-packages/
Date Sun, 01 Nov 2015 22:32:30 GMT
On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
<nicholas.chammas@gmail.com> wrote:
> OK, I’ll focus on the Apache mirrors going forward.
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn’t
> disappear tomorrow.
>
> They don’t go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> So does Apache offer no way to query a URL and automatically get the closest
> working mirror? If I’m installing HDFS onto servers in various EC2 regions,
> the best mirror will vary depending on my location.
>
Not sure if this is officially documented somewhere but if you pass
'&asjson=1' you will get back a JSON which has a 'preferred' field set
to the closest mirror.

Shivaram
> Nick
>
>
> On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> <shivaram@eecs.berkeley.edu> wrote:
>>
>> I think that getting them from the ASF mirrors is a better strategy in
>> general as it'll remove the overhead of keeping the S3 bucket up to
>> date. It works in the spark-ec2 case because we only support a limited
>> number of Hadoop versions from the tool. FWIW I don't have write
>> access to the bucket and also haven't heard of any plans to support
>> newer versions in spark-ec2.
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <stevel@hortonworks.com>
>> wrote:
>> >
>> > On 1 Nov 2015, at 03:17, Nicholas Chammas <nicholas.chammas@gmail.com>
>> > wrote:
>> >
>> > https://s3.amazonaws.com/spark-related-packages/
>> >
>> > spark-ec2 uses this bucket to download and install HDFS on clusters. Is
>> > it
>> > owned by the Spark project or by the AMPLab?
>> >
>> > Anyway, it looks like the latest Hadoop install available on there is
>> > Hadoop
>> > 2.4.0.
>> >
>> > Are there plans to add newer versions of Hadoop for use by spark-ec2 and
>> > similar tools, or should we just be getting that stuff via an Apache
>> > mirror?
>> > The latest version is 2.7.1, by the way.
>> >
>> >
>> > you should be grabbing the artifacts off the ASF and then verifying
>> > their
>> > SHA1 checksums as published on the ASF HTTPS web site
>> >
>> >
>> > The problem with the Apache mirrors, if I am not mistaken, is that you
>> > cannot use a single URL that automatically redirects you to a working
>> > mirror
>> > to download Hadoop. You have to pick a specific mirror and pray it
>> > doesn't
>> > disappear tomorrow.
>> >
>> >
>> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
>> > the
>> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>> >
>> > full list with availability stats
>> >
>> > http://www.apache.org/mirrors/
>> >
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message