spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huizhe Wang <wang.h...@husky.neu.edu>
Subject Re: [spark on yarn] spark on yarn without DFS
Date Wed, 22 May 2019 02:00:44 GMT
Hi Hari,
Thanks :) I tried to do it as u said. It works ;)


Hariharan <hariharan022@gmail.com>于2019年5月20日 周一下午3:54写道:

> Hi Huizhe,
>
> You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
> That way your spark job will use S3 for all operations that need HDFS.
> Intermediate data will still be stored on local disk though.
>
> Thanks,
> Hari
>
> On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari <abdealikothari@gmail.com>
> wrote:
>
>> While spark can read from S3 directly in EMR, I believe it still needs
>> the HDFS to perform shuffles and to write intermediate data into disk when
>> doing jobs (I.e. when the in memory need stop spill over to disk)
>>
>> For these operations, Spark does need a distributed file system - You
>> could use something like EMRFS (which is like a HDFS backed by S3) on
>> Amazon.
>>
>> The issue could be something else too - so a stacktrace or error message
>> could help in understanding the problem.
>>
>>
>>
>> On Mon, May 20, 2019, 07:20 Huizhe Wang <wang.huiz@husky.neu.edu> wrote:
>>
>>> Hi,
>>>
>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
>>> using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
>>> DataNode. I got an error when using yarn cluster mode. Could I using yarn
>>> without start DFS, how could I use this mode?
>>>
>>> Yours,
>>> Jane
>>>
>>

Mime
View raw message