storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "P. Taylor Goetz" <ptgo...@gmail.com>
Subject Re: thread safe output collector
Date Thu, 28 Apr 2016 19:39:59 GMT
I was partly right. In earlier versions (pre-1.0) was not thread safe (due to shuffle grouping).
1.0 introduced a new shuffle grouping implementation that is thread-safe, in addition to a
load-aware shuffle grouping. I had missed the fact that the new shuffle grouping is thread
safe.

-Taylor

> On Apr 28, 2016, at 1:48 PM, Steven Lewis <Steven.Lewis@walmart.com> wrote:
> 
> We tried implementing our own thread safe output collector but it seems ridiculous for
such a concurrent system. Why don’t they implement it in core Storm? They can have the current
one, and then build a separate one called Concurrent_OutputCollector or some such. I was hoping
they fixed that in version 1.0 and I missed it.
> 
> From: Stephen Powis <spowis@salesforce.com <mailto:spowis@salesforce.com>>
> Reply-To: "user@storm.apache.org <mailto:user@storm.apache.org>" <user@storm.apache.org
<mailto:user@storm.apache.org>>
> Date: Thursday, April 28, 2016 at 10:58 AM
> To: "user@storm.apache.org <mailto:user@storm.apache.org>" <user@storm.apache.org
<mailto:user@storm.apache.org>>
> Subject: Re: thread safe output collector
> 
> So the Spout documentation (assuming its correct...) here (http://storm.apache.org/releases/current/Concepts.html#spouts
<http://storm.apache.org/releases/current/Concepts.html#spouts>) mentions this:
> 
> "The main method on spouts is nextTuple. nextTuple either emits a new tuple into the
topology or simply returns if there are no new tuples to emit. It is imperative that nextTuple
does not block for any spout implementation, because Storm calls all the spout methods on
the same thread."
> 
> When developing a custom spout we interpreted it to mean that any "real work" done by
a spout should be done in a separate thread, and decided on the following pattern which seems
some what relevant to what you are trying to do in your bolts.
> 
> On Spout prepare, we create a concurrent/thread safe queue.  We then create a new Thread
passing it a reference to our thread safe queue.  This thread handles finding new data that
needs to be emitted.  When that thread finds data, it adds it to the shared queue.  When the
spout's nextTuple() method is called, it looks for data on the shared queue and emits it.
> 
> I imagine doing async processing in a bolt using one or more threads could work with
a similar pattern.  On prepare you setup your thread(s) with references to a shared queue.
 The bolt passes work to be completed to the thread(s), the thread(s) communicate back to
the bolt the result via a shared queue.  Add in the concept of tick tuples to ensure your
bolt checks for completed work on a regular basis?
> 
> Is there a better way to do this?
> 
> On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche <lists.digitalpebble@gmail.com <mailto:lists.digitalpebble@gmail.com>>
wrote:
> Thanks for the clarification
> 
> On 28 April 2016 at 15:12, P. Taylor Goetz <ptgoetz@gmail.com <mailto:ptgoetz@gmail.com>>
wrote:
> The documentation is wrong. See:
> 
> https://issues.apache.org/jira/browse/STORM-841 <https://issues.apache.org/jira/browse/STORM-841>
> 
> At some point it looks like the change made there got reverted. I will reopen it to make
sure the documentation is corrected.
> 
> OutputCollector is NOT thread-safe.
> 
> -Taylor
> 
>> On Apr 28, 2016, at 9:06 AM, Stephen Powis <spowis@salesforce.com <mailto:spowis@salesforce.com>>
wrote:
>> 
>> "Its perfectly fine to launch new threads in bolts that do processing asynchronously.
OutputCollector <http://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html>
is thread-safe and can be called at any time."
>> 
>> 
>> From the docs for 0.9.6: http://storm.apache.org/releases/0.9.6/Concepts.html#bolts
<http://storm.apache.org/releases/0.9.6/Concepts.html#bolts>
>> 
>> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz <ptgoetz@gmail.com <mailto:ptgoetz@gmail.com>>
wrote:
>> IIRC there was discussion about making it thread safe, but I don't believe it was
implemented.
>> 
>> -Taylor
>> 
>> On Apr 28, 2016, at 3:52 AM, Julien Nioche <lists.digitalpebble@gmail.com <mailto:lists.digitalpebble@gmail.com>>
wrote:
>> 
>>> Hi Stephen
>>> 
>>> I asked the same question in February but did not get a reply
>>> 
>>> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3CCA+-fM0urPf3FUERoZywPzmxu-KDbGF-Zj3wbYr8evsAQJC6u_g@mail.gmail.com%3E
<https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3CCA+-fM0urPf3FUERoZywPzmxu-KDbGF-Zj3wbYr8evsAQJC6u_g@mail.gmail.com%3E>
>>> 
>>> Anyone who could confirm this?
>>> 
>>> Thanks
>>> 
>>> On 27 April 2016 at 14:05, Steven Lewis <Steven.Lewis@walmart.com <mailto:Steven.Lewis@walmart.com>>
wrote:
>>> I have conflicting information, and have not checked personally but has the output
collector finally been made thread safe for emitting in version 1.0 or 0.10? I know it was
a huge problem in 0.9.5 when trying to do threading in a bolt for async future calls and emitting
once it returns.
>>> 
>>> This email and any files transmitted with it are confidential and intended solely
for the individual or entity to whom they are addressed. If you have received this email in
error destroy it immediately. *** Walmart Confidential ***
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Open Source Solutions for Text Engineering
>>> 
>>> http://www.digitalpebble.com <http://www.digitalpebble.com/>
>>> http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/>
>>> #digitalpebble <http://twitter.com/digitalpebble>
>> 
> 
> 
> 
> 
> --
> 
> Open Source Solutions for Text Engineering
> 
> http://www.digitalpebble.com <http://www.digitalpebble.com/>
> http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/>
> #digitalpebble <http://twitter.com/digitalpebble>
> 
> This email and any files transmitted with it are confidential and intended solely for
the individual or entity to whom they are addressed. If you have received this email in error
destroy it immediately. *** Walmart Confidential ***


Mime
View raw message