From user-return-10597-apmail-storm-user-archive=storm.apache.org@storm.apache.org Thu Apr 28 15:58:59 2016 Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E27AD19578 for ; Thu, 28 Apr 2016 15:58:58 +0000 (UTC) Received: (qmail 84782 invoked by uid 500); 28 Apr 2016 15:58:58 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 84742 invoked by uid 500); 28 Apr 2016 15:58:57 -0000 Mailing-List: contact user-help@storm.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.apache.org Delivered-To: mailing list user@storm.apache.org Received: (qmail 84732 invoked by uid 99); 28 Apr 2016 15:58:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 15:58:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id ED2DEC77C4 for ; Thu, 28 Apr 2016 15:58:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.198 X-Spam-Level: ** X-Spam-Status: No, score=2.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LIVE=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=salesforce.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id MLUZuHTN18hm for ; Thu, 28 Apr 2016 15:58:54 +0000 (UTC) Received: from mail-qg0-f51.google.com (mail-qg0-f51.google.com [209.85.192.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C34EE5F297 for ; Thu, 28 Apr 2016 15:58:54 +0000 (UTC) Received: by mail-qg0-f51.google.com with SMTP id f92so31578139qgf.0 for ; Thu, 28 Apr 2016 08:58:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salesforce.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=ByWUZCWSTyPbXRt1U1gJIwHI8LKt37OblD/TiHHNT50=; b=gv7D4mkUa9ehnP1SKr1Py9K3ZDQxNCEKFNBgIqIK5TR04cAy3ANewxEYoKgUo1Br5d 8U+ACXUZ25GRwTaMAAL8GnkTREC1YHtuIZUesW1lfHYIvpfvfQ2ew0ylBdbqM7DZwRM6 LH2UauEVGpaxlO7HWA1c5pDYcC8UWz1Nw/jNM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=ByWUZCWSTyPbXRt1U1gJIwHI8LKt37OblD/TiHHNT50=; b=Bf6rpG5piFKQ5GIZ/6urezttD/7Grk4iNUcYIG2nHsBeHrcnF2VEzo3Nk/4cKBALZx wjL/4MXAEsqJEJ54vVrfm4U4xKrQe0hqU40Q9KAbED44X/Y91LKsJd/raHUULsIpebeM 2WkGOzb2Ipp8LRGDW5x3hl3IXHfDpijtU5g76b8/Z7Z4+lDF/Y/mIH2rLFPE6GTBr2Re I2SKZP4tstADNSDXR8VfFV0k7TMZaGRy/u9o9RPMTPY4gHyNYW/PP2zJpfeQnm5ffbG5 bZG63HRXZf9dEW6taO68b6m8cYslWoUvjK7mZymYnEqAgS3VYhT6wxqciaRAX7b/3bqB duGA== X-Gm-Message-State: AOPr4FWq+898/ZFCRUmajTwTVYpCQ8PnhzSXgiD6TGvgxmmy8tL8FY7nrQHo/blen1ahUpuRWvsk+rBZgc648WXk MIME-Version: 1.0 X-Received: by 10.140.28.8 with SMTP id 8mr14314716qgy.91.1461859134073; Thu, 28 Apr 2016 08:58:54 -0700 (PDT) Received: by 10.140.98.8 with HTTP; Thu, 28 Apr 2016 08:58:53 -0700 (PDT) In-Reply-To: References: Date: Thu, 28 Apr 2016 11:58:53 -0400 Message-ID: Subject: Re: thread safe output collector From: Stephen Powis To: user@storm.apache.org Content-Type: multipart/alternative; boundary=001a113a9b9466576905318d9a7a --001a113a9b9466576905318d9a7a Content-Type: text/plain; charset=UTF-8 So the Spout documentation (assuming its correct...) here ( http://storm.apache.org/releases/current/Concepts.html#spouts) mentions this: "The main method on spouts is nextTuple. nextTuple either emits a new tuple into the topology or simply returns if there are no new tuples to emit. *It is imperative that nextTuple does not block for any spout implementation, because Storm calls all the spout methods on the same thread.*" When developing a custom spout we interpreted it to mean that any "real work" done by a spout should be done in a separate thread, and decided on the following pattern which seems some what relevant to what you are trying to do in your bolts. On Spout prepare, we create a concurrent/thread safe queue. We then create a new Thread passing it a reference to our thread safe queue. This thread handles finding new data that needs to be emitted. When that thread finds data, it adds it to the shared queue. When the spout's nextTuple() method is called, it looks for data on the shared queue and emits it. I imagine doing async processing in a bolt using one or more threads could work with a similar pattern. On prepare you setup your thread(s) with references to a shared queue. The bolt passes work to be completed to the thread(s), the thread(s) communicate back to the bolt the result via a shared queue. Add in the concept of tick tuples to ensure your bolt checks for completed work on a regular basis? Is there a better way to do this? On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche < lists.digitalpebble@gmail.com> wrote: > Thanks for the clarification > > On 28 April 2016 at 15:12, P. Taylor Goetz wrote: > >> The documentation is wrong. See: >> >> https://issues.apache.org/jira/browse/STORM-841 >> >> At some point it looks like the change made there got reverted. I will >> reopen it to make sure the documentation is corrected. >> >> OutputCollector is NOT thread-safe. >> >> -Taylor >> >> On Apr 28, 2016, at 9:06 AM, Stephen Powis wrote: >> >> "Its perfectly fine to launch new threads in bolts that do processing >> asynchronously. OutputCollector >> >> is thread-safe and can be called at any time." >> >> >> From the docs for 0.9.6: >> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts >> >> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz >> wrote: >> >>> IIRC there was discussion about making it thread safe, but I don't >>> believe it was implemented. >>> >>> -Taylor >>> >>> On Apr 28, 2016, at 3:52 AM, Julien Nioche < >>> lists.digitalpebble@gmail.com> wrote: >>> >>> Hi Stephen >>> >>> I asked the same question in February but did not get a reply >>> >>> >>> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3CCA+-fM0urPf3FUERoZywPzmxu-KDbGF-Zj3wbYr8evsAQJC6u_g@mail.gmail.com%3E >>> >>> Anyone who could confirm this? >>> >>> Thanks >>> >>> On 27 April 2016 at 14:05, Steven Lewis >>> wrote: >>> >>>> I have conflicting information, and have not checked personally but has >>>> the output collector finally been made thread safe for emitting in version >>>> 1.0 or 0.10? I know it was a huge problem in 0.9.5 when trying to do >>>> threading in a bolt for async future calls and emitting once it returns. >>>> >>>> This email and any files transmitted with it are confidential and >>>> intended solely for the individual or entity to whom they are addressed. If >>>> you have received this email in error destroy it immediately. *** Walmart >>>> Confidential *** >>>> >>> >>> >>> >>> -- >>> >>> *Open Source Solutions for Text Engineering* >>> >>> http://www.digitalpebble.com >>> http://digitalpebble.blogspot.com/ >>> #digitalpebble >>> >>> >> >> > > > -- > > *Open Source Solutions for Text Engineering* > > http://www.digitalpebble.com > http://digitalpebble.blogspot.com/ > #digitalpebble > --001a113a9b9466576905318d9a7a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
So the Spout documentation (assuming its co= rrect...) here (http://storm.apache.org/releases/current/Concepts.html#spout= s) mentions this:

"The main method on spouts is nextT= uple. nextTuple either emits a new tuple into the topol= ogy or simply returns if there are no new tuples to emit. It is imperati= ve that nextTuple does not block for any spout implementation,= because Storm calls all the spout methods on the same thread."
When developing a custom spout we interpreted it to mean that an= y "real work" done by a spout should be done in a separate thread= , and decided on the following pattern which seems some what relevant to wh= at you are trying to do in your bolts.

On Spout prepare, we cr= eate a concurrent/thread safe queue.=C2=A0 We then create a new Thread pass= ing it a reference to our thread safe queue.=C2=A0 This thread handles find= ing new data that needs to be emitted.=C2=A0 When that thread finds data, i= t adds it to the shared queue.=C2=A0 When the spout's nextTuple() metho= d is called, it looks for data on the shared queue and emits it.

I imagine doing async processing in a bolt using one or more threa= ds could work with a similar pattern.=C2=A0 On prepare you setup your threa= d(s) with references to a shared queue.=C2=A0 The bolt passes work to be co= mpleted to the thread(s), the thread(s) communicate back to the bolt the re= sult via a shared queue.=C2=A0 Add in the concept of tick tuples to ensure = your bolt checks for completed work on a regular basis?

I= s there a better way to do this?
=
On Thu, Apr 28, 2016 at 11:22 AM, Julien Nio= che <lists.digitalpebble@gmail.com> wrote:
Thanks for the clarification<= /div>
On 28 April 2016 at 15:12, P. Taylor Goetz <pt= goetz@gmail.com> wrote:
The documentation is wrong. See:

<= /div>
=
At some point it looks like the change made there got revert= ed. I will reopen it to make sure the documentation is corrected.

OutputCollector is NOT thread-safe.

-Taylor
=
On Apr 28, 2016, at 9:06 AM, Stephe= n Powis <spow= is@salesforce.com> wrote:

"It= s perfectly fine to launch new threads in bolts that do processing asynchro= nously. OutputCollector= is thread-safe and can be called at any time."


From = the docs for 0.9.6: http://storm.apache.org/releases/0.9.6/Co= ncepts.html#bolts


On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz <ptgoet= z@gmail.com> wrote:
IIRC there was discussion about making it thread= safe, but I don't believe it was implemented.

-Taylor
=
On Apr 28, 2016, at 3:52 AM, Julien Nioche <lists.digitalpebble@gmail.com> wrote:

H= i Stephen

I asked the same question in February but did = not get a reply


=
Anyone who could confirm this?

Thanks

On 27 April 2= 016 at 14:05, Steven Lewis <Steven.Lewis@walmart.com>= wrote:
I have conflicting information, and have not checked personally but ha= s the output collector finally been made thread safe for emitting in versio= n 1.0 or 0.10? I know it was a huge problem in 0.9.5 when trying to do thre= ading in a bolt for async future calls and emitting once it returns.

This email and any files transmitted with it are confidential and intended = solely for the individual or entity to whom they are addressed. If you have= received this email in error destroy it immediately. *** Walmart Confident= ial ***



--

Open Source Solutions = for Text Engineering




--

Open Source= Solutions for Text Engineering
<= span style=3D"border-collapse:separate;font-family:'Times New Roman'= ;;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:= normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:= 0px;font-size:medium">
http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble


--001a113a9b9466576905318d9a7a--