beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (BEAM-690) Backoff in the DirectRunner Monitor if no work is Available
Date Thu, 06 Sep 2018 20:20:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-690?focusedWorklogId=141928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141928
]

ASF GitHub Bot logged work on BEAM-690:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Sep/18 20:19
            Start Date: 06/Sep/18 20:19
    Worklog Time Spent: 10m 
      Work Description: janotav commented on issue #6303: [BEAM-690] Backoff in the DirectRunner
if no work is available
URL: https://github.com/apache/beam/pull/6303#issuecomment-419227607
 
 
   Thanks for the feedback guys. To be honest I'm no longer convinced this is the right thing
to do. It does indeed decrease the CPU consumption significantly, however, at least in our
case it is not enough. It turns out that even if the pipeline is completely empty, the driver
goes
   
   THROTTLE, THROTTLE, CONTINUE, THROTTLE, THROTTLE, CONTINUE, ... and so on ...
   
   So effectively the active loop becomes loop with 15 ms sleep (average of 10 and 20 ms).
Because the code performed in the active phase is itself non-trivial, this still puts easily
measurable load on the CPU. I was able to achieve some further minor improvements by doing
some low-level changes in how the driver works with collections, but it became obvious that
(at least in my quite specific use-case) this leads nowhere.
   
   I was able to come up with an alternative (applicative) solution that simply blocks the
DirectRunner threads when the pipeline is empty and only resumes the DirectRunner loop when
new data enter the pipeline. 
   
   I'll keep on thinking about this for a while yet and then probably close this PR unless
I figure out how to make it really useful...
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 141928)
    Time Spent: 0.5h  (was: 20m)

> Backoff in the DirectRunner Monitor if no work is Available
> -----------------------------------------------------------
>
>                 Key: BEAM-690
>                 URL: https://issues.apache.org/jira/browse/BEAM-690
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>            Reporter: Thomas Groh
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a Pipeline has no elements available to process, the Monitor Runnable will be repeatedly
scheduled. Given that there is no work to be done, this will loop over the steps in the transform
looking for timers, and prompt the sources to perform additional work, even though there is
no work to be done. This consumes the entirety of a single core.
> Add a bounded backoff to rescheduling the monitor runnable if no work has been done since
it last ran. This will reduce resource consumption on low-throughput Pipelines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message