spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: The performance of group operation on SSD
Date Fri, 17 Jan 2014 19:23:31 GMT
Also I should say with this that 1.0 will stay on Scala 2.10, and more generally I think we
want to keep having releases for Scala 2.10 at least for this year. It should be easier to
cross-build future releases for both 2.10 and 2.11 than it was with the 2.9 -> 2.10 jump.

Matei

On Jan 17, 2014, at 11:22 AM, Matei Zaharia <matei.zaharia@gmail.com> wrote:

> What file system do you have? One thing we’ve observed is that ext3, which is the default
on ephemeral disks on EC2, scales very poorly to multicore workloads. We recommend reformatting
those as XFS (which is very fast to format) or ext4 (which unfortunately takes a few hours
to finalize). Maybe there are also other FS options that affect SSDs.
> 
> Matei
> 
> On Jan 17, 2014, at 8:01 AM, Andrew Ash <andrew@andrewash.com> wrote:
> 
>> Are there different amounts of RAM on the SSD machines vs the Spinny disk
>> machines?
>> 
>> Sent from my mobile phone
>> On Jan 17, 2014 5:22 AM, "Jay" <hjayin@gmail.com> wrote:
>> 
>>> OS memory cache??
>>> 
>>> Sent from my iPad.
>>> 
>>>> 在 2014年1月16日,上午6:04,Chen Jin <karen.cj@gmail.com> 写道:
>>>> 
>>>> Dear Spark developers:
>>>> 
>>>> We are benchmarking spark operations such as filter, group, join on
>>>> ssd instance i2.2xlarge on EC2. Most operations are similar or
>>>> slightly better than ephemeral disks on EC2, however, the performance
>>>> of group operation on SDD  are much worse than regular disks, at least
>>>> 2x to 3x worse. Could any of you shed some lights on this behavior?
>>>> 
>>>> Thanks a lot,
>>>> 
>>>> -chen
>>> 
> 


Mime
View raw message