spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Surprising Spark SQL benchmark
Date Thu, 06 Nov 2014 01:03:04 GMT
Steve,

Your original comment was about the *reproducibility* of the benchmark,
which I was responding to. No one is suggesting you doubt the authenticity
or results of the benchmark.

For which no details or code have been released to allow others to
> reproduce it. I would encourage anyone doing a Spark benchmark in future

to avoid the stigma of vendor reported benchmarks and publish enough

information and code to let others repeat the exercise easily.


So to reiterate, the results and paper that Databricks published should let
other people reproduce their submission to the Daytona Gray benchmark. This
addresses your original concern quoted above.

Nick


On Wed, Nov 5, 2014 at 7:10 PM, Reynold Xin <rxin@databricks.com> wrote:

> Steve,
>
> I wouldn't say Hadoop MR is a 2001 Toyota Celica :) In either case, I
> updated the blog post to actually include CPU / disk / network measures.
> You should see that in any measure that matters to this benchmark, the old
> 2100 node cluster is vastly superior. The data even fit in memory!
>
>
>
> On Wed, Nov 5, 2014 at 4:07 PM, Steve Nunez <snunez@hortonworks.com>
> wrote:
>
>> Nicholas,
>>
>> I never doubted the authenticity of the benchmark, nor the results. What I
>> think could be better is an objective analysis of the results. That post
>> neglected to point out the significant differences in hardware those two
>> benchmarks were run on. It is bit like bragging you broke the world record
>> at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to
>> mention that the last record was held by a 2001 Toyota Celica.
>>
>> - Steve
>>
>>
>> From:  Nicholas Chammas <nicholas.chammas@gmail.com>
>> Date:  Wednesday, November 5, 2014 at 15:56
>> To:  Steve Nunez <snunez@hortonworks.com>
>> Cc:  Patrick Wendell <pwendell@gmail.com>, dev <dev@spark.apache.org>
>> Subject:  Re: Surprising Spark SQL benchmark
>>
>> > Steve Nunez, I believe the information behind the links below should
>> address
>> > your concerns earlier about Databricks's submission to the Daytona Gray
>> > benchmark.
>> >
>> > On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com>
>> > wrote:
>> >> On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas
>> >> <nicholas.chammas@gmail.com> wrote:
>> >>
>> >>> I believe that benchmark has a pending certification on it. See
>> >>> http://sortbenchmark.org under "Process".
>> >> Regarding this comment, Reynold has just announced that this benchmark
>> is now
>> >> certified.
>> >> * Announcement:
>> >>
>> http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-l
>> >> arge-scale-sorting.html
>> >> * Updated benchmark results page: http://sortbenchmark.org/
>> >> * Paper detailing Spark cluster configuration for the benchmark:
>> >> http://sortbenchmark.org/ApacheSpark2014.pdf
>> >> Nick
>> >>
>> >> ​
>> >
>>
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message