spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aureliano Buendia <buendia...@gmail.com>
Subject Re: Using persistent hdfs on spark ec2 instanes
Date Thu, 23 Jan 2014 00:50:28 GMT
On Thu, Jan 23, 2014 at 12:41 AM, Patrick Wendell <pwendell@gmail.com>wrote:

> It should work correctly and yes, it starts and stops on port 9010.
> You'll need to use "hdfs://<master-hostname>:9010/path/to/whatever" to
> access files from Spark. Is that what you are asking about?
>

Actually, when I tried:

myRdd.saveAsTextFile("hdfs://<master-hostname>:9000/path/to/whatever")

It threw an error, as spark already tries to add the
"hdfs://<master-hostname>:9000/" prefix to the path.

So I use:

myRdd.saveAsTextFile("/path/to/whatever")

and it ends up in the ephemeral hdfs. That's why I asked if spark needs a
configuration to work with the persistent hdfs.

Of course, as you mentioned, the other way is to change persistent port to
9000.


>
> On Wed, Jan 22, 2014 at 4:36 PM, Aureliano Buendia <buendia360@gmail.com>
> wrote:
> > peristent-hdfs server is set to 9010 port, instead of 9000. Does spark
> need
> > more config for this?
> >
> >
> > On Thu, Jan 23, 2014 at 12:26 AM, Patrick Wendell <pwendell@gmail.com>
> > wrote:
> >>
> >> > 1. It seems by default spark ec2 uses ephemeral hdfs, how to switch
> this
> >> > to
> >> > persistent hdfs?
> >> You can stop the ephemeral one using
> >>
> >> /root/ephemeral-hdfs/bin/stop-dfs.sh
> >>
> >> and start the persistent one using
> >>
> >>  /root/persistent-hdfs/bin/start-dfs.sh
> >>
> >> > 2. By default persistent hdfs server is not up, is this meant to be
> like
> >> > this?
> >>
> >> Yes - it starts only an ephemeral one:
> >>
> >> "The spark-ec2 script already sets up a HDFS instance for you. It’s
> >> installed in /root/ephemeral-hdfs"
> >
> >
>

Mime
View raw message