storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Powis <spo...@salesforce.com>
Subject Re: thread safe output collector
Date Fri, 29 Apr 2016 13:12:40 GMT
You're probably right, if its an expensive operation to package your data
into a formatted tuple, it may make more sense for your spout to emit
something simple, and have a downstream bolt package it up.

In the situation I was describing our spout is executing a SQL statement to
gather rows that should be emitted as tuples, so the "processing time" of
the spout is more around how fast or slow that query statement ends up
being, and less about converting them to tuples -- we're actually querying
against somewhere around 100 different databases to find the data.  Doing
that in a single thread with the other spouts seemed not ideal, so thats
why we kicked it off to separate threads.

On Fri, Apr 29, 2016 at 8:53 AM, Hart, James W. <jwhart@seic.com> wrote:

> I’m working on a topology that will be similar to this application so I
> was thinking about this yesterday.
>
>
>
> I’m thinking that if there is any significant work to do on messages in
> making them into tuples, shouldn’t the message be emitted and the work be
> in a bolt?  I don’t think that bolt execute functions have the same
> limitations as spout nextTuple functions.  Now with that said, bolt
> executes should not be long running computations either, but can be longer
> than the spouts nextTuple function.
>
>
>
> *From:* Stephen Powis [mailto:spowis@salesforce.com]
> *Sent:* Thursday, April 28, 2016 11:59 AM
> *To:* user@storm.apache.org
> *Subject:* Re: thread safe output collector
>
>
>
> So the Spout documentation (assuming its correct...) here (
> http://storm.apache.org/releases/current/Concepts.html#spouts) mentions
> this:
>
>
> "The main method on spouts is nextTuple. nextTuple either emits a new
> tuple into the topology or simply returns if there are no new tuples to
> emit. *It is imperative that **nextTuple** does not block for any spout
> implementation, because Storm calls all the spout methods on the same
> thread.*"
>
> When developing a custom spout we interpreted it to mean that any "real
> work" done by a spout should be done in a separate thread, and decided on
> the following pattern which seems some what relevant to what you are trying
> to do in your bolts.
>
> On Spout prepare, we create a concurrent/thread safe queue.  We then
> create a new Thread passing it a reference to our thread safe queue.  This
> thread handles finding new data that needs to be emitted.  When that thread
> finds data, it adds it to the shared queue.  When the spout's nextTuple()
> method is called, it looks for data on the shared queue and emits it.
>
> I imagine doing async processing in a bolt using one or more threads could
> work with a similar pattern.  On prepare you setup your thread(s) with
> references to a shared queue.  The bolt passes work to be completed to the
> thread(s), the thread(s) communicate back to the bolt the result via a
> shared queue.  Add in the concept of tick tuples to ensure your bolt checks
> for completed work on a regular basis?
>
> Is there a better way to do this?
>
>
>
> On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
>
> Thanks for the clarification
>
>
>
> On 28 April 2016 at 15:12, P. Taylor Goetz <ptgoetz@gmail.com> wrote:
>
> The documentation is wrong. See:
>
>
>
> https://issues.apache.org/jira/browse/STORM-841
>
>
>
> At some point it looks like the change made there got reverted. I will
> reopen it to make sure the documentation is corrected.
>
>
>
> OutputCollector is NOT thread-safe.
>
>
>
> -Taylor
>
>
>
> On Apr 28, 2016, at 9:06 AM, Stephen Powis <spowis@salesforce.com> wrote:
>
>
>
> "Its perfectly fine to launch new threads in bolts that do processing
> asynchronously. OutputCollector
> <http://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html>
> is thread-safe and can be called at any time."
>
>
>
> From the docs for 0.9.6:
> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts
>
>
>
> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz <ptgoetz@gmail.com>
> wrote:
>
> IIRC there was discussion about making it thread safe, but I don't believe
> it was implemented.
>
>
>
> -Taylor
>
>
> On Apr 28, 2016, at 3:52 AM, Julien Nioche <lists.digitalpebble@gmail.com>
> wrote:
>
> Hi Stephen
>
>
>
> I asked the same question in February but did not get a reply
>
>
>
>
> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3CCA+-fM0urPf3FUERoZywPzmxu-KDbGF-Zj3wbYr8evsAQJC6u_g@mail.gmail.com%3E
>
>
>
> Anyone who could confirm this?
>
>
>
> Thanks
>
>
>
> On 27 April 2016 at 14:05, Steven Lewis <Steven.Lewis@walmart.com> wrote:
>
> I have conflicting information, and have not checked personally but has
> the output collector finally been made thread safe for emitting in version
> 1.0 or 0.10? I know it was a huge problem in 0.9.5 when trying to do
> threading in a bolt for async future calls and emitting once it returns.
>
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart
> Confidential ***
>
>
>
>
>
> --
>
>
> *Open Source Solutions for Text Engineering*
>
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>
>
>
>
>
>
>
>
>
>
> --
>
>
> *Open Source Solutions for Text Engineering*
>
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>
>
>
>

Mime
View raw message