nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Subject Re: Common scheduler and add-hock thread creation
Date Tue, 17 Nov 2015 01:50:19 GMT
Tony, thanks for your input. At least we have some discussion going. See in line for the rest.

> On Nov 16, 2015, at 8:22 PM, Tony Kurc <trkurc@gmail.com> wrote:
> 
> so, I believe threads in a processor in nifi are much, much easier than
> general threading in many other applications. There are defined boundaries
> on when a processor is built and torn down. Pretty much any state in the
> middle is up to the processor. you know when resources need to be stood up.
> you know when they need to be torn down.
Generally true and I’d agree there is not much one can do to stop users doing what they
wan to do regardless of how damaging it may be to the rest of the system
> 
> Because threads have a localized scope, I'm not sure a global pool would be
> a help. If a processor needs higher throughput or shorter latency, now, the
> problem is generally isolated and there is a nice little cream center to
> optimize. If you're blocked on a global pool of threads because some other
> processor consumed all the threads in a pool, well, suddenly, your
> performance is no longer a localized problem.
> 
This argument is argumentative ;) 
1. What if I’ve saturated all my cores in my localized Processor’s thread pool with things
like while (true){}? Then it really doesn’t matter what the rest of the framework does,
the system is hosed. So blockage in this case comes from let’s just call it malicious processor
and not global thread pool. So, in the end its a bit of a general discipline question ;)
2. So in this case one of the best practices could be taken right from Brian’s book that
states that tasks should be as short lived as possible. Any repeats and  retries, should be
handled by rerunning/rescheduling a task instead of spinning in the loop inside of task. So
with global Scheduler exposed via context or something that each Processor, Service etc. sees
we can have a shared Thread pool. We can even have ControllerService as ThreadPools. 
Yes, that would take some serious code review and general discipline from the developers but
the benefit would be proportional as well.

> because the common case is "don't use threads" (not everyone is going to
> build a complex service, contribute to the core framework or need threads
> in their processor) I actually think code review is a good way to shake out
> some poor decisions. because optimizing the threads in a processor for a
> use case a specialized task (the processor writer knows the critical
> sections and bottlenecks), I'm not sure whether there are massive strides
> that can be made, but I could be wrong. And we'll always have a weird edge
> case of some library that wants to do threads its own way that we're trying
> to integrate.
> 
> My guess is a lot of the behavior you mention above are because at the
> moment, performance isn't needed in that part of code and it was simpler
> for the author. Or its a bug!
I would probably use "performance isn't needed” argument but in hypothetical word of thousands
of processors each creating Threads, the so called ’simplicity' could manifest itself as
a bug.

I don’t wan to generalize to much at he moment as it is much easier to discuss a concrete
case (we have plenty). But I really wanted to get discussion going on this as I am still studying
the code base.

Cheers
Oleg

> 
> 
> 
> On Mon, Nov 16, 2015 at 8:01 PM, Oleg Zhurakousky <
> ozhurakousky@hortonworks.com> wrote:
> 
>> Taking liberties - so let me throw few example. I am sure you’d agree that
>> Thread creation and management is an expensive and hard and error prone,
>> hence new java.util.concurrent and all the goodies in it.
>> - There is a patch currently in the queue where there is a creation of new
>> Thread() and then starting it. Is it necessary? Could we reuse the thread
>> from the common pool?
>> - We have many places where we have Thread.sleep(..) and in fact do sleep
>> considerable amount of time. That thread lays dormant where it could
>> actually be doing something. Is it necessary?
>> 
>> Cheers
>> Oleg
>> 
>> 
>>> On Nov 16, 2015, at 7:52 PM, Tony Kurc <trkurc@gmail.com> wrote:
>>> 
>>> the issue with a best practices guide on this subject is it will be
>>> dominated by edge cases. The common case should be "don't produce any
>>> threads".
>>> 
>>> That being said, I commented on a jira somewhere about
>> LinkedBlockingQueues
>>> used in so many producer/consumer style processors and possibly needing a
>>> library to have some consistency in using those queues in a consistent
>>> thread safe manner.
>>> 
>>> Also, I'm not quite sure of what you mean by taking liberties?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Nov 16, 2015 at 7:39 PM, Oleg Zhurakousky <
>>> ozhurakousky@hortonworks.com> wrote:
>>> 
>>>> Guys
>>>> 
>>>> I am noticing many modules where we have things like "new
>>>> Thread(..).start()”, creation of new executors and schedulers,
>>>> Thread.sleep(..)  etc.,. I am sure many would agree that taking such
>>>> liberties with Threads will have consequences (not IF but WHEN)
>>>> On several threads several of us mentioned a “must read” for anyone who
>> is
>>>> getting into concurrent code -
>>>> 
>> http://ptgmedia.pearsoncmg.com/images/9780321349606/samplepages/9780321349606.pdf
>>>> and indeed we can/should definitely grab some best practices from this
>> book.
>>>> 
>>>> At least we can start from what’s our strategy around thread management
>>>> for NAR developers? Basically should/should not a user create Threads,
>>>> Executors, Schedulers etc.
>>>> 
>>>> Cheers
>>>> Oleg
>>>> 
>> 
>> 

Mime
View raw message