spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdeali Kothari <>
Subject Re: [spark on yarn] spark on yarn without DFS
Date Mon, 20 May 2019 04:44:32 GMT
While spark can read from S3 directly in EMR, I believe it still needs the
HDFS to perform shuffles and to write intermediate data into disk when
doing jobs (I.e. when the in memory need stop spill over to disk)

For these operations, Spark does need a distributed file system - You could
use something like EMRFS (which is like a HDFS backed by S3) on Amazon.

The issue could be something else too - so a stacktrace or error message
could help in understanding the problem.

On Mon, May 20, 2019, 07:20 Huizhe Wang <> wrote:

> Hi,
> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
> using s3a to get them. However, when I use stoped Namenode and
> DataNode. I got an error when using yarn cluster mode. Could I using yarn
> without start DFS, how could I use this mode?
> Yours,
> Jane

View raw message