spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: Interconnect benchmarking
Date Sat, 28 Jun 2014 01:23:28 GMT
A simple throughput test is also repartition()ing a large RDD. This also
stresses the disks, though, so you might try to mount your spark temporary
directory as a ramfs.


On Fri, Jun 27, 2014 at 5:57 PM, danilopds <danilobits@gmail.com> wrote:

> Hi,
> According with the research paper bellow of Mathei Zaharia, Spark's
> creator,
> http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf
>
> He says on page 10 that:
> Grep is network-bound due to the cost to replicate the input data to
> multiple nodes.
>
> So,
> I guess a can be a good initial recommendation.
>
> But I would like to know others workloads too.
> Best Regards.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Interconnect-benchmarking-tp8467p8470.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message