spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <>
Subject Re: Quality of documentation (rant)
Date Sun, 19 Jan 2014 16:56:00 GMT
Here what I would suggest. In order to protect from Human error, start a
ec2 instance with the ec2 script. Copy over the folders as they are well
integrated with hdfs, compatible drivers and versions. Change configuration
to configure slaves and masters.
Sorry if its offensive :) .. I found dealing with various hadoop
incompatibilities very taxing.

Mayur Rustagi
Ph: +919632149971
h <>ttp://

On Sun, Jan 19, 2014 at 8:23 PM, Ognen Duzlevski

> On Sun, Jan 19, 2014 at 2:49 PM, Ognen Duzlevski <
> > wrote:
>> My basic requirement is to set everything up myself and understand it.
>> For testing purposes my cluster has 15 xlarge instances and I guess I will
>> just set up a hadoop cluster to run over these instances for the purposes
>> of getting the benefits of HDFS. I would then set up hdfs over S3 with
>> blocks.
> By this I mean I would set up a Hadoop cluster running in parallel on the
> same instances just for the purposes of running Spark over HDFS. Is this a
> reasonable approach? What kind of a performance penalty (memory, CPU
> cycles) am I going to incur by the Hadoop daemons running just for this
> purpose?
> Thanks!
> Ognen

View raw message