spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Valdes, Pablo" <pval...@comscore.com>
Subject Minimum cluster size for empirical testing
Date Mon, 01 Dec 2014 19:54:25 GMT
Hi everyone,

I’m interested in empirically measuring how faster spark works in comparison to Hadoop for
certain problems and input corpus I currently work with (I’ve read Matei Zahari’s “Resilient
Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” paper
and I want to perform a similar test). I personally think measuring the difference of speed
in a single 1-node cluster isn’t enough, so I was wondering what would you recommend for
this task, in regards of number of clusters/specs, etc.
I was thinking it could possible to launch a couple of CDH5 VMs across a few computers or
do you think it would be easier to do it with Amazon EC2?

I’m particularly interested in knowing what is the overall experience in this case and what
are your recommendations (what other common problems to test and what kind of benchmarks)

Have a great start of the week.
Cheers



Pablo Valdes Software Engineer | comScore, Inc. (NASDAQ:SCOR)

pvaldes@comscore.com<mailto:pvaldes@comscore.com>



Av. Del Cóndor N° 520, oficina 202, Ciudad Empresarial, Comuna de Huechuraba, | Santiago
| CL

...........................................................................................................

comScore is a global leader in digital media analytics. We make audiences and advertising
more valuable. To learn more, visit www.comscore.com<http://www.comscore.com>


Mime
View raw message