spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumya Simanta <soumya.sima...@gmail.com>
Subject Re: Comparative study
Date Tue, 08 Jul 2014 00:14:30 GMT


Daniel, 

Do you mind sharing the size of your cluster and the production data volumes ? 

Thanks
Soumya 

> On Jul 7, 2014, at 3:39 PM, Daniel Siegmann <daniel.siegmann@velos.io> wrote:
> 
> From a development perspective, I vastly prefer Spark to MapReduce. The MapReduce API
is very constrained; Spark's API feels much more natural to me. Testing and local development
is also very easy - creating a local Spark context is trivial and it reads local files. For
your unit tests you can just have them create a local context and execute your flow with some
test data. Even better, you can do ad-hoc work in the Spark shell and if you want that in
your production code it will look exactly the same.
> 
> Unfortunately, the picture isn't so rosy when it gets to production. In my experience,
Spark simply doesn't scale to the volumes that MapReduce will handle. Not with a Standalone
cluster anyway - maybe Mesos or YARN would be better, but I haven't had the opportunity to
try them. I find jobs tend to just hang forever for no apparent reason on large data sets
(but smaller than what I push through MapReduce).
> 
> I am hopeful the situation will improve - Spark is developing quickly - but if you have
large amounts of data you should proceed with caution.
> 
> Keep in mind there are some frameworks for Hadoop which can hide the ugly MapReduce with
something very similar in form to Spark's API; e.g. Apache Crunch. So you might consider those
as well.
> 
> (Note: the above is with Spark 1.0.0.)
> 
> 
> 
>> On Mon, Jul 7, 2014 at 11:07 AM, <santosh.viswanathan@accenture.com> wrote:
>> Hello Experts,
>> 
>>  
>> 
>> I am doing some comparative study on the below:
>> 
>>  
>> 
>> Spark vs Impala
>> 
>> Spark vs MapREduce . Is it worth migrating from existing MR implementation to Spark?
>> 
>>  
>> 
>>  
>> 
>> Please share your thoughts and expertise.
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Santosh
>> 
>> 
>> 
>> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy. 
>> ______________________________________________________________________________________
>> 
>> www.accenture.com
> 
> 
> 
> -- 
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
> 
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegmann@velos.io W: www.velos.io

Mime
View raw message