spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kk.gmail" <khanderao.k...@gmail.com>
Subject Re: The performance of group operation on SSD
Date Fri, 17 Jan 2014 20:22:57 GMT
This would be good to support backward and forward versions.




> On Jan 17, 2014, at 11:23 AM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
> 
> Also I should say with this that 1.0 will stay on Scala 2.10, and more generally I think
we want to keep having releases for Scala 2.10 at least for this year. It should be easier
to cross-build future releases for both 2.10 and 2.11 than it was with the 2.9 -> 2.10
jump.
> 
> Matei
> 
>> On Jan 17, 2014, at 11:22 AM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
>> 
>> What file system do you have? One thing we’ve observed is that ext3, which is the
default on ephemeral disks on EC2, scales very poorly to multicore workloads. We recommend
reformatting those as XFS (which is very fast to format) or ext4 (which unfortunately takes
a few hours to finalize). Maybe there are also other FS options that affect SSDs.
>> 
>> Matei
>> 
>>> On Jan 17, 2014, at 8:01 AM, Andrew Ash <andrew@andrewash.com> wrote:
>>> 
>>> Are there different amounts of RAM on the SSD machines vs the Spinny disk
>>> machines?
>>> 
>>> Sent from my mobile phone
>>>> On Jan 17, 2014 5:22 AM, "Jay" <hjayin@gmail.com> wrote:
>>>> 
>>>> OS memory cache??
>>>> 
>>>> Sent from my iPad.
>>>> 
>>>>> 在 2014年1月16日,上午6:04,Chen Jin <karen.cj@gmail.com>
写道:
>>>>> 
>>>>> Dear Spark developers:
>>>>> 
>>>>> We are benchmarking spark operations such as filter, group, join on
>>>>> ssd instance i2.2xlarge on EC2. Most operations are similar or
>>>>> slightly better than ephemeral disks on EC2, however, the performance
>>>>> of group operation on SDD  are much worse than regular disks, at least
>>>>> 2x to 3x worse. Could any of you shed some lights on this behavior?
>>>>> 
>>>>> Thanks a lot,
>>>>> 
>>>>> -chen
> 

Mime
View raw message