spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject computation slows down 10x because of cached RDDs
Date Mon, 10 Mar 2014 22:18:41 GMT
hello all,
i am observing a strange result. i have a computation that i run on a
cached RDD in spark-standalone. it typically takes about 4 seconds.

but when other RDDs that are not relevant to the computation at hand are
cached in memory (in same spark context), the computation takes 40 seconds
or more.

the problem seems to be GC time, which goes from milliseconds to tens of
seconds.

note that my issue is not that memory is full. i have cached about 14G in
RDDs with 66G available across workers for the application. also my
computation did not push any cached RDD out of memory.

any ideas?

Mime
View raw message