spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: computation slows down 10x because of cached RDDs
Date Tue, 11 Mar 2014 22:09:36 GMT
hey matei,
ok when i switch to java 7 with G1 the GC time for all the "quick" tasks
goes from 150ms to 10ms, but the slow ones stay just as slow. all i did was
add -XX:+UseG1GC so maybe thats wrong, i still have to read up on G1.

an example of GC in a slow task is below.
best, koert


[GC pause (young), 0.0100070 secs]
   [Parallel Time: 7.3 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 2889329.4, Avg: 2889329.8, Max:
2889330.0, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 2.0, Avg: 2.5, Max: 3.2, Diff: 1.2,
Sum: 44.2]
      [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 5.8]
         [Processed Buffers: Min: 0, Avg: 3.3, Max: 14, Diff: 14, Sum: 59]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 0.2, Avg: 0.7, Max: 1.0, Diff: 0.8, Sum: 12.0]
      [Termination (ms): Min: 0.0, Avg: 2.4, Max: 2.6, Diff: 2.6, Sum: 43.8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum:
0.5]
      [GC Worker Total (ms): Min: 5.7, Avg: 5.9, Max: 6.3, Diff: 0.6, Sum:
106.4]
      [GC Worker End (ms): Min: 2889335.7, Avg: 2889335.7, Max: 2889335.7,
Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 1.1 ms]
   [Other: 1.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.0 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 0.3 ms]
   [Eden: 20.0M(752.0M)->0.0B(304.0M) Survivors: 2048.0K->2048.0K Heap:
5917.7M(6144.0M)->5898.8M(6144.0M)]
 [Times: user=0.12 sys=0.00, real=0.01 secs]
[Full GC 5898M->2057M(6144M), 5.7978580 secs]
   [Eden: 2048.0K(304.0M)->0.0B(1064.0M) Survivors: 2048.0K->0.0B Heap:
5898.9M(6144.0M)->2057.5M(6144.0M)]
 [Times: user=9.24 sys=0.00, real=5.80 secs]



On Tue, Mar 11, 2014 at 4:35 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Yeah, System.gc() is a suggestion but in practice it does invoke full GCs
> on the Sun JVM.
>
> Matei
>
> On Mar 11, 2014, at 12:35 PM, Koert Kuipers <koert@tresata.com> wrote:
>
> hey matei,
> ha i will definitely that one! looks like a total hack... i might just
> schedule it after the precaching of rdds defensively.
>
> also trying java 7 with g1
>
>
> On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
>
>> Right, that's it. I think what happened is the following: all the nodes
>> generated some garbage that put them very close to the threshold for a full
>> GC in the first few runs of the program (when you cached the RDDs), but on
>> the subsequent queries, only a few nodes are hitting full GC per query, so
>> every query sees a slowdown but the problem persists for a whille. You can
>> try manually forcing a GC on the nodes like this after you do your loading:
>>
>> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
>>
>> Where numNodes is your number of nodes. (Actually it's also okay to just
>> make this higher, System.gc() returns fast when there's no GC to run.)
>>
>> Matei
>>
>> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <koert@tresata.com> wrote:
>>
>> hey matei,
>> most tasks have GC times of 200ms or less, and then a few tasks take many
>> seconds. example GC activity for a slow one:
>>
>> [GC [PSYoungGen: 1051814K->262624K(1398144K)]
>> 3789259K->3524429K(5592448K), 0.0986800 secs] [Times: user=1.53 sys=0.01,
>> real=0.10 secs]
>> [GC [PSYoungGen: 786935K->524512K(1398144K)]
>> 4048741K->4048762K(5592448K), 0.1132490 secs] [Times: user=1.70 sys=0.01,
>> real=0.11 secs]
>> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen:
>> 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen:
>> 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>>  user=7.71 sys=0.00, real=7.70 secs]
>>
>>
>> so looks like i am hit by stop-the-world gc?
>>
>>
>> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> hey matei,
>>> it happens repeatedly.
>>>
>>> we are currently runnning on java 6 with spark 0.9.
>>>
>>> i will add -XX:+PrintGCDetails and collect details, and also look into
>>> java 7 G1. thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
>>>
>>>> Does this happen repeatedly if you keep running the computation, or
>>>> just the first time? It may take time to move these Java objects to the old
>>>> generation the first time you run queries, which could lead to a GC pause
>>>> that also slows down the small queries.
>>>>
>>>> If you can run with -XX:+PrintGCDetails in your Java options, it would
>>>> also be good to see what percent of each GC generation is used.
>>>>
>>>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC
>>>> in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing
>>>> concurrently with your application threads.
>>>>
>>>> Matei
>>>>
>>>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <koert@tresata.com> wrote:
>>>>
>>>> hello all,
>>>> i am observing a strange result. i have a computation that i run on a
>>>> cached RDD in spark-standalone. it typically takes about 4 seconds.
>>>>
>>>> but when other RDDs that are not relevant to the computation at hand
>>>> are cached in memory (in same spark context), the computation takes 40
>>>> seconds or more.
>>>>
>>>> the problem seems to be GC time, which goes from milliseconds to tens
>>>> of seconds.
>>>>
>>>> note that my issue is not that memory is full. i have cached about 14G
>>>> in RDDs with 66G available across workers for the application. also my
>>>> computation did not push any cached RDD out of memory.
>>>>
>>>> any ideas?
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Mime
View raw message