spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Using persistent hdfs on spark ec2 instanes
Date Thu, 23 Jan 2014 00:43:38 GMT
You can change the behavior if you edit core-site.xml in the conf
directory of spark to make 9010 the default filesystem. This is
something where the docs could probably be improved to mention this,
if you have interest in submitting a PR I'd be happy to review it.

If you look the default is to have 9000 as the filesystem:
https://github.com/mesos/spark-ec2/blob/v2/templates/root/spark/conf/core-site.xml

On Wed, Jan 22, 2014 at 4:41 PM, Patrick Wendell <pwendell@gmail.com> wrote:
> It should work correctly and yes, it starts and stops on port 9010.
> You'll need to use "hdfs://<master-hostname>:9010/path/to/whatever" to
> access files from Spark. Is that what you are asking about?
>
> On Wed, Jan 22, 2014 at 4:36 PM, Aureliano Buendia <buendia360@gmail.com> wrote:
>> peristent-hdfs server is set to 9010 port, instead of 9000. Does spark need
>> more config for this?
>>
>>
>> On Thu, Jan 23, 2014 at 12:26 AM, Patrick Wendell <pwendell@gmail.com>
>> wrote:
>>>
>>> > 1. It seems by default spark ec2 uses ephemeral hdfs, how to switch this
>>> > to
>>> > persistent hdfs?
>>> You can stop the ephemeral one using
>>>
>>> /root/ephemeral-hdfs/bin/stop-dfs.sh
>>>
>>> and start the persistent one using
>>>
>>>  /root/persistent-hdfs/bin/start-dfs.sh
>>>
>>> > 2. By default persistent hdfs server is not up, is this meant to be like
>>> > this?
>>>
>>> Yes - it starts only an ephemeral one:
>>>
>>> "The spark-ec2 script already sets up a HDFS instance for you. It’s
>>> installed in /root/ephemeral-hdfs"
>>
>>

Mime
View raw message