spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: why spark driver program is creating so many threads? How can I limit this number?
Date Tue, 01 Nov 2016 09:58:41 GMT
@Sean It looks like this problem can happen with other RDD's as well. Not
just unionRDD

On Tue, Nov 1, 2016 at 2:52 AM, kant kodali <kanth909@gmail.com> wrote:

> Hi Sean,
>
> The comments seem very relevant although I am not sure if this pull
> request https://github.com/apache/spark/pull/14985 would fix my issue? I
> am not sure what unionRDD.scala has anything to do with my error (I don't
> know much about spark code base). Do I ever use unionRDD.scala when I call
> mapToPair or ReduceByKey or forEachRDD?  This error is very easy to
> reproduce you actually don't need to ingest any data to spark streaming
> job. Just have one simple transformation consists of mapToPair, reduceByKey
> and forEachRDD and have the window interval of 1min and batch interval of
> one one second and simple call ssc.awaitTermination() and watch the
> Thread Count go up significantly.
>
> I do think that using a fixed size executor service would probably be a
> safer approach. One could leverage ForJoinPool if they think they could
> benefit a lot from the work-steal algorithm and doubly ended queues in the
> ForkJoinPool.
>
> Thanks!
>
>
>
>
> On Tue, Nov 1, 2016 at 2:19 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> Possibly https://issues.apache.org/jira/browse/SPARK-17396 ?
>>
>> On Tue, Nov 1, 2016 at 2:11 AM kant kodali <kanth909@gmail.com> wrote:
>>
>>> Hi Ryan,
>>>
>>> I think you are right. This may not be related to the Receiver. I have
>>> attached jstack dump here. I do a simple MapToPair and reduceByKey and  I
>>> have a window Interval of 1 minute (60000ms) and batch interval of 1s (
>>> 1000) This is generating lot of threads atleast 5 to 8 threads per
>>> second and the total number of threads is monotonically increasing. So just
>>> for tweaking purpose I changed my window interval to 1min (60000ms) and
>>> batch interval of 10s (10000) this looked lot better but still not
>>> ideal at very least it is not monotonic anymore (It goes up and down). Now
>>> my question  really is how do I tune such that my number of threads are
>>> optimal while satisfying the window Interval of 1 minute (60000ms) and
>>> batch interval of 1s (1000) ?
>>>
>>> This jstack dump is taken after running my spark driver program for 2
>>> mins and there are about 1000 threads.
>>>
>>> Thanks!
>>>
>>
>

Mime
View raw message