spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: The performance of group operation on SSD
Date Fri, 17 Jan 2014 21:27:22 GMT
Chen, I would also look at actual I/O patterns of the operations. SSDs
writes are sensitive to significantly variable performance depending on the
exact scenario, and can easily underperform HDD given the "right"
conditions. Generically quoted IOPS numbers are not reliable across a
variety of commonly occurring workloads.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/i2-instances.html
http://www.diffen.com/difference/HDD_vs_SSD
http://www.storagesearch.com/problem-write-iops.html
http://www.dhtusa.com/media/SSDBench.pdf



--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Fri, Jan 17, 2014 at 11:23 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Also I should say with this that 1.0 will stay on Scala 2.10, and more
> generally I think we want to keep having releases for Scala 2.10 at least
> for this year. It should be easier to cross-build future releases for both
> 2.10 and 2.11 than it was with the 2.9 -> 2.10 jump.
>
> Matei
>
> On Jan 17, 2014, at 11:22 AM, Matei Zaharia <matei.zaharia@gmail.com>
> wrote:
>
> > What file system do you have? One thing we’ve observed is that ext3,
> which is the default on ephemeral disks on EC2, scales very poorly to
> multicore workloads. We recommend reformatting those as XFS (which is very
> fast to format) or ext4 (which unfortunately takes a few hours to
> finalize). Maybe there are also other FS options that affect SSDs.
> >
> > Matei
> >
> > On Jan 17, 2014, at 8:01 AM, Andrew Ash <andrew@andrewash.com> wrote:
> >
> >> Are there different amounts of RAM on the SSD machines vs the Spinny
> disk
> >> machines?
> >>
> >> Sent from my mobile phone
> >> On Jan 17, 2014 5:22 AM, "Jay" <hjayin@gmail.com> wrote:
> >>
> >>> OS memory cache??
> >>>
> >>> Sent from my iPad.
> >>>
> >>>> 在 2014年1月16日,上午6:04,Chen Jin <karen.cj@gmail.com>
写道:
> >>>>
> >>>> Dear Spark developers:
> >>>>
> >>>> We are benchmarking spark operations such as filter, group, join on
> >>>> ssd instance i2.2xlarge on EC2. Most operations are similar or
> >>>> slightly better than ephemeral disks on EC2, however, the performance
> >>>> of group operation on SDD  are much worse than regular disks, at least
> >>>> 2x to 3x worse. Could any of you shed some lights on this behavior?
> >>>>
> >>>> Thanks a lot,
> >>>>
> >>>> -chen
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message