storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Edward (GDI Hadoop)" <yonzh...@ebay.com>
Subject Re: [Discussion] storm local-mode event object reuse bug
Date Tue, 01 Dec 2015 17:50:17 GMT
In my opinion, it is not about immutability of an object. It is about the contract between
storm framework and storm application. In this case, it looks like application code has to
deep copy every object from input because it can’t be reused.

I think that is also fine if the contract is that application should assume the event object
you received is possibly shared. But ImmutableMap would not solve the problem.

Thanks
Edward

From: "Grant Overby (groverby)" <groverby@cisco.com<mailto:groverby@cisco.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Tuesday, December 1, 2015 at 8:25
To: user <user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: [Discussion] storm local-mode event object reuse bug

Serialization isn’t free. By skipping it where possible, even in a cluster, it’s worth
doing so to conserve CPU resources.

Using immutable objects is cheaper. Assuming you’re coding in java, consider using ImmutableMap,
ImmutableMap.Builder, and similar classes in the Guava library from Google. http://docs.guava-libraries.googlecode.com/git-history/v18.0/javadoc/com/google/common/collect/ImmutableMap.html


[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
groverby@cisco.com<mailto:groverby@cisco.com>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print.

This email may contain confidential and privileged material for the sole use of the intended
recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If
you are not the intended recipient (or authorized to receive for the recipient), please contact
the sender by reply email and delete all copies of this message.

Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html>
for Company Registration Information.





From: Nathan Leung <ncleung@gmail.com<mailto:ncleung@gmail.com>>
Reply-To: user <user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Tuesday, December 1, 2015 at 9:30 AM
To: user <user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: [Discussion] storm local-mode event object reuse bug

It is bypassed by design.  As noted in https://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html,
the emitted objects must be immutable.  If you're intent on modifying them, be very careful.

On Tue, Dec 1, 2015 at 4:28 AM, Stephen Powis <spowis@salesforce.com<mailto:spowis@salesforce.com>>
wrote:
I believe anytime tuples are passed between bolts on the same jvm (either in local mode or
in remote mode where the upstream and downstream bolt both reside on the same worker) serialization
is bypassed by design.

On Tue, Dec 1, 2015 at 1:46 PM, Edward Zhang <yonzhang2012@apache.org<mailto:yonzhang2012@apache.org>>
wrote:
Hi Storm developers,

Today, I hit one possible storm issue which happens in local mode. In local mode, one event
object is sent out of spout and looks it does not go through serialization/deserialization,
instead this event object including its members is directly referenced by following bolts.
So when one bolt modifies this event object then another bolt will also see the changes immediately.

For example the event object sent by spout includes one java Map object, if there are 2 following
bolts after this spout, then in one bolt if we modify this Map object, then the other bolt
will see that or throw concurrentmodificationexception if it iterates the Map Object.

Please let us know if this behavior should be corrected by storm framework or by storm application.
In storm application, we can do deep copy if it's local mode, but in storm framework, probably
serialization/deserialization should be always executed.

Let me know your thoughts.

Thanks
Edward Zhang



Mime
View raw message