spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: why spark driver program is creating so many threads? How can I limit this number?
Date Tue, 01 Nov 2016 09:52:41 GMT
Hi Sean,

The comments seem very relevant although I am not sure if this pull request
https://github.com/apache/spark/pull/14985 would fix my issue? I am not
sure what unionRDD.scala has anything to do with my error (I don't know
much about spark code base). Do I ever use unionRDD.scala when I call
mapToPair or ReduceByKey or forEachRDD?  This error is very easy to
reproduce you actually don't need to ingest any data to spark streaming
job. Just have one simple transformation consists of mapToPair, reduceByKey
and forEachRDD and have the window interval of 1min and batch interval of
one one second and simple call ssc.awaitTermination() and watch the Thread
Count go up significantly.

I do think that using a fixed size executor service would probably be a
safer approach. One could leverage ForJoinPool if they think they could
benefit a lot from the work-steal algorithm and doubly ended queues in the
ForkJoinPool.

Thanks!




On Tue, Nov 1, 2016 at 2:19 AM, Sean Owen <sowen@cloudera.com> wrote:

> Possibly https://issues.apache.org/jira/browse/SPARK-17396 ?
>
> On Tue, Nov 1, 2016 at 2:11 AM kant kodali <kanth909@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> I think you are right. This may not be related to the Receiver. I have
>> attached jstack dump here. I do a simple MapToPair and reduceByKey and  I
>> have a window Interval of 1 minute (60000ms) and batch interval of 1s (
>> 1000) This is generating lot of threads atleast 5 to 8 threads per
>> second and the total number of threads is monotonically increasing. So just
>> for tweaking purpose I changed my window interval to 1min (60000ms) and
>> batch interval of 10s (10000) this looked lot better but still not ideal
>> at very least it is not monotonic anymore (It goes up and down). Now my
>> question  really is how do I tune such that my number of threads are
>> optimal while satisfying the window Interval of 1 minute (60000ms) and
>> batch interval of 1s (1000) ?
>>
>> This jstack dump is taken after running my spark driver program for 2
>> mins and there are about 1000 threads.
>>
>> Thanks!
>>
>

Mime
View raw message