spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dinesh J. Weerakkody" <dineshjweerakk...@gmail.com>
Subject Re: Breaking the previous large-scale sort record with Spark
Date Fri, 10 Oct 2014 15:38:51 GMT
Wow.. Cool.. Congratulations.. :)

On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska <ted.malaska@cloudera.com>
wrote:

> This is a bad deal, great job.
>
> On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
>
> > Brilliant stuff ! Congrats all :-)
> > This is indeed really heartening news !
> >
> > Regards,
> > Mridul
> >
> >
> > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia <matei.zaharia@gmail.com>
> > wrote:
> > > Hi folks,
> > >
> > > I interrupt your regularly scheduled user / dev list to bring you some
> > pretty cool news for the project, which is that we've been able to use
> > Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x
> > faster on 10x fewer nodes. There's a detailed writeup at
> >
> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html
> .
> > Summary: while Hadoop MapReduce held last year's 100 TB world record by
> > sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on
> > 206 nodes; and we also scaled up to sort 1 PB in 234 minutes.
> > >
> > > I want to thank Reynold Xin for leading this effort over the past few
> > weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali
> > Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for
> > providing the machines to make this possible. Finally, this result would
> of
> > course not be possible without the many many other contributions, testing
> > and feature requests from throughout the community.
> > >
> > > For an engine to scale from these multi-hour petabyte batch jobs down
> to
> > 100-millisecond streaming and interactive queries is quite uncommon, and
> > it's thanks to all of you folks that we are able to make this happen.
> > >
> > > Matei
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > For additional commands, e-mail: dev-help@spark.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>



-- 
Thanks & Best Regards,

*Dinesh J. Weerakkody*
*www.dineshjweerakkody.com <http://www.dineshjweerakkody.com>*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message