storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Bush <johntylerb...@gmail.com>
Subject Re: thread safe output collector
Date Sat, 30 Apr 2016 06:11:01 GMT
I ran into this issue a few weeks ago, in a bolt, using Futures in Scala.
Basically the acks I was doing in new threads never got back (well at least
not all of them).  The solution I ended up with was to use a thread safe
queue and then flush the acks out from a tick (as was described earlier in
this thread).  It definitely works, I don't know if there is a better way.

My solution is documented here:
https://scalalala.wordpress.com/2016/04/09/async-stormy-weather/

On Fri, Apr 29, 2016 at 6:12 AM, Stephen Powis <spowis@salesforce.com>
wrote:

> You're probably right, if its an expensive operation to package your data
> into a formatted tuple, it may make more sense for your spout to emit
> something simple, and have a downstream bolt package it up.
>
> In the situation I was describing our spout is executing a SQL statement
> to gather rows that should be emitted as tuples, so the "processing time"
> of the spout is more around how fast or slow that query statement ends up
> being, and less about converting them to tuples -- we're actually querying
> against somewhere around 100 different databases to find the data.  Doing
> that in a single thread with the other spouts seemed not ideal, so thats
> why we kicked it off to separate threads.
>
> On Fri, Apr 29, 2016 at 8:53 AM, Hart, James W. <jwhart@seic.com> wrote:
>
>> I’m working on a topology that will be similar to this application so I
>> was thinking about this yesterday.
>>
>>
>>
>> I’m thinking that if there is any significant work to do on messages in
>> making them into tuples, shouldn’t the message be emitted and the work be
>> in a bolt?  I don’t think that bolt execute functions have the same
>> limitations as spout nextTuple functions.  Now with that said, bolt
>> executes should not be long running computations either, but can be longer
>> than the spouts nextTuple function.
>>
>>
>>
>> *From:* Stephen Powis [mailto:spowis@salesforce.com]
>> *Sent:* Thursday, April 28, 2016 11:59 AM
>> *To:* user@storm.apache.org
>> *Subject:* Re: thread safe output collector
>>
>>
>>
>> So the Spout documentation (assuming its correct...) here (
>> http://storm.apache.org/releases/current/Concepts.html#spouts) mentions
>> this:
>>
>>
>> "The main method on spouts is nextTuple. nextTuple either emits a new
>> tuple into the topology or simply returns if there are no new tuples to
>> emit. *It is imperative that **nextTuple** does not block for any spout
>> implementation, because Storm calls all the spout methods on the same
>> thread.*"
>>
>> When developing a custom spout we interpreted it to mean that any "real
>> work" done by a spout should be done in a separate thread, and decided on
>> the following pattern which seems some what relevant to what you are trying
>> to do in your bolts.
>>
>> On Spout prepare, we create a concurrent/thread safe queue.  We then
>> create a new Thread passing it a reference to our thread safe queue.  This
>> thread handles finding new data that needs to be emitted.  When that thread
>> finds data, it adds it to the shared queue.  When the spout's nextTuple()
>> method is called, it looks for data on the shared queue and emits it.
>>
>> I imagine doing async processing in a bolt using one or more threads
>> could work with a similar pattern.  On prepare you setup your thread(s)
>> with references to a shared queue.  The bolt passes work to be completed to
>> the thread(s), the thread(s) communicate back to the bolt the result via a
>> shared queue.  Add in the concept of tick tuples to ensure your bolt checks
>> for completed work on a regular basis?
>>
>> Is there a better way to do this?
>>
>>
>>
>> On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>>
>> Thanks for the clarification
>>
>>
>>
>> On 28 April 2016 at 15:12, P. Taylor Goetz <ptgoetz@gmail.com> wrote:
>>
>> The documentation is wrong. See:
>>
>>
>>
>> https://issues.apache.org/jira/browse/STORM-841
>>
>>
>>
>> At some point it looks like the change made there got reverted. I will
>> reopen it to make sure the documentation is corrected.
>>
>>
>>
>> OutputCollector is NOT thread-safe.
>>
>>
>>
>> -Taylor
>>
>>
>>
>> On Apr 28, 2016, at 9:06 AM, Stephen Powis <spowis@salesforce.com> wrote:
>>
>>
>>
>> "Its perfectly fine to launch new threads in bolts that do processing
>> asynchronously. OutputCollector
>> <http://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html>
>> is thread-safe and can be called at any time."
>>
>>
>>
>> From the docs for 0.9.6:
>> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts
>>
>>
>>
>> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz <ptgoetz@gmail.com>
>> wrote:
>>
>> IIRC there was discussion about making it thread safe, but I don't
>> believe it was implemented.
>>
>>
>>
>> -Taylor
>>
>>
>> On Apr 28, 2016, at 3:52 AM, Julien Nioche <lists.digitalpebble@gmail.com>
>> wrote:
>>
>> Hi Stephen
>>
>>
>>
>> I asked the same question in February but did not get a reply
>>
>>
>>
>>
>> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3CCA+-fM0urPf3FUERoZywPzmxu-KDbGF-Zj3wbYr8evsAQJC6u_g@mail.gmail.com%3E
>>
>>
>>
>> Anyone who could confirm this?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On 27 April 2016 at 14:05, Steven Lewis <Steven.Lewis@walmart.com> wrote:
>>
>> I have conflicting information, and have not checked personally but has
>> the output collector finally been made thread safe for emitting in version
>> 1.0 or 0.10? I know it was a huge problem in 0.9.5 when trying to do
>> threading in a bolt for async future calls and emitting once it returns.
>>
>>
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the individual or entity to whom they are addressed. If
>> you have received this email in error destroy it immediately. *** Walmart
>> Confidential ***
>>
>>
>>
>>
>>
>> --
>>
>>
>> *Open Source Solutions for Text Engineering*
>>
>>
>> http://www.digitalpebble.com
>> http://digitalpebble.blogspot.com/
>> #digitalpebble <http://twitter.com/digitalpebble>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> *Open Source Solutions for Text Engineering*
>>
>>
>> http://www.digitalpebble.com
>> http://digitalpebble.blogspot.com/
>> #digitalpebble <http://twitter.com/digitalpebble>
>>
>>
>>
>
>

Mime
View raw message