spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: computation slows down 10x because of cached RDDs
Date Mon, 10 Mar 2014 23:00:21 GMT
hey matei,
it happens repeatedly.

we are currently runnning on java 6 with spark 0.9.

i will add -XX:+PrintGCDetails and collect details, and also look into java
7 G1. thanks






On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Does this happen repeatedly if you keep running the computation, or just
> the first time? It may take time to move these Java objects to the old
> generation the first time you run queries, which could lead to a GC pause
> that also slows down the small queries.
>
> If you can run with -XX:+PrintGCDetails in your Java options, it would
> also be good to see what percent of each GC generation is used.
>
> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in
> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently
> with your application threads.
>
> Matei
>
> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <koert@tresata.com> wrote:
>
> hello all,
> i am observing a strange result. i have a computation that i run on a
> cached RDD in spark-standalone. it typically takes about 4 seconds.
>
> but when other RDDs that are not relevant to the computation at hand are
> cached in memory (in same spark context), the computation takes 40 seconds
> or more.
>
> the problem seems to be GC time, which goes from milliseconds to tens of
> seconds.
>
> note that my issue is not that memory is full. i have cached about 14G in
> RDDs with 66G available across workers for the application. also my
> computation did not push any cached RDD out of memory.
>
> any ideas?
>
>
>

Mime
View raw message