helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-613) TaskStateModel generates significant amount of threads and causing thread leaking problem
Date Mon, 09 Nov 2015 04:55:10 GMT

    [ https://issues.apache.org/jira/browse/HELIX-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996036#comment-14996036
] 

ASF GitHub Bot commented on HELIX-613:
--------------------------------------

Github user lei-xia commented on the pull request:

    https://github.com/apache/helix/pull/38#issuecomment-154924074
  
    It is little bit tricky to add unit test to verify there is no leaking, also it takes
some time to generate and detect leaking, which makes the unit tests takes even longer time
(it already takes quite long time to run all unit tests now). 
    
    Having said that, although we do not have unit test on this,  we have already verified
this fix on our testing environments, with Helix runs for two days, starting and finishing
around ~10000 jobs. We did not see significant amount of threads created as we saw before,
and also the thread numbers keeps pretty stable over the time.  


> TaskStateModel generates significant amount of threads and causing thread leaking problem
> -----------------------------------------------------------------------------------------
>
>                 Key: HELIX-613
>                 URL: https://issues.apache.org/jira/browse/HELIX-613
>             Project: Apache Helix
>          Issue Type: Bug
>    Affects Versions: 0.6.x
>            Reporter: Lei Xia
>            Assignee: Lei Xia
>
> Current TaskStateModel creates a thread-pool containing 40 threads for each instance
of TaskStateModel, thus it creates 40 threads for each task (partition). Since Job are dynamic
resources, the thread pool is not properly shutdown when task has completed (or timeouted,
failed, etc).  We saw ~10000 threads were created in our production machines.
> Also, the timeout timer in each TaskStateModel is not properly cancelled even though
the task has completed or failed. The timer consume a thread even though it is not used anymore.

> The proposed solution is to use a shared thread pool for all TaskStateModel in a single
TaskStateModelFactory for all regular tasks and timeout task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message