spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ognen Duzlevski <og...@nengoiksvelzud.com>
Subject Re: Quality of documentation (rant)
Date Sun, 19 Jan 2014 14:53:18 GMT
On Sun, Jan 19, 2014 at 2:49 PM, Ognen Duzlevski
<ognen@nengoiksvelzud.com>wrote:

>
> My basic requirement is to set everything up myself and understand it. For
> testing purposes my cluster has 15 xlarge instances and I guess I will just
> set up a hadoop cluster to run over these instances for the purposes of
> getting the benefits of HDFS. I would then set up hdfs over S3 with blocks.
>

By this I mean I would set up a Hadoop cluster running in parallel on the
same instances just for the purposes of running Spark over HDFS. Is this a
reasonable approach? What kind of a performance penalty (memory, CPU
cycles) am I going to incur by the Hadoop daemons running just for this
purpose?

Thanks!
Ognen

Mime
View raw message