spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Haris <michal.ha...@visualdna.com>
Subject large volume spark job spends most of the time in AppendOnlyMap.changeValue
Date Wed, 06 May 2015 09:45:14 GMT
Just wanted to check if somebody has seen similar behaviour or knows what
we might be doing wrong. We have a relatively complex spark application
which processes half a terabyte of data at various stages. We have profiled
it in several ways and everything seems to point to one place where 90% of
the time is spent:  AppendOnlyMap.changeValue. The job scales and is
relatively faster than its map-reduce alternative but it still feels slower
than it should be. I am suspecting too much spill but I haven't seen any
improvement by increasing number of partitions to 10k. Any idea would be
appreciated.

-- 
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.com | t: +44 (0) 207 734 7033,

Mime
View raw message