storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navin Ipe <navin....@searchlighthealth.com>
Subject Re: Does Storm use Netty to access a class reference across worker JVM's?
Date Fri, 22 Apr 2016 10:41:35 GMT
Thank you very much for your time and help, Nathan and John.
Followed up on serialization, and it looks like everything can be
serialized: http://stackoverflow.com/a/16851174/453673
Will verify the database connection serialization also during
implementation.

On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung <ncleung@gmail.com> wrote:

> mongoManager is serialized and sent to your spout.  If it's not something
> that's easily serializable (e.g. a database connection) then you will need
> to initialize it in spout prepare() instead of the constructor.
>
> On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe <
> navin.ipe@searchlighthealth.com> wrote:
>
>> Thanks John, but that's odd...in the code I shared, there's a reference
>> to mongoManager being used in the Spout (the MongoSpout internally stores a
>> reference to mongoManager). If there are no object references shared
>> between executors, then when the topology I created is submitted to Storm,
>> would Storm serialize or clone and store an instance of mongoManager (and
>> all initialized values inside it) inside the Spout? Storm would surely have
>> to do *something *to ensure that references aren't cut off when workers
>> operate in different JVM's...
>>
>> On Thu, Apr 21, 2016 at 4:45 PM, <hokiegeek2@gmail.com> wrote:
>>
>>> Netty is used for communication between workers and then the LMAX
>>> disruptor queue is used to route messages between Netty and the individual
>>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct
>>> object references shared between executors because all executors
>>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX
>>> (localIrShuffleGrouping).
>>>
>>> --John
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <navin.ipe@searchlighthealth.com>
>>> wrote:
>>>
>>> In the below code,
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *public static void main(String[] cmdArgs) {Config config = new
>>> Config();config.setNumWorkers(5);        MongoManager mongoManager = new
>>> MongoManager();TopologyBuilder builder = new
>>> TopologyBuilder();builder.setSpout("someSpout", new
>>> MongoSpout(mongoManger)));}*
>>>
>>> Assuming there are many more spouts and blots created, I understand that each
>>> worker will run in its own JVM
>>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
>>> which means that it will have its own memory space.
>>>
>>> *Questions:*
>>> *1.* So when the mongoManager reference is passed to MongoSpout, will
>>> MongoSpout always be able to access the initialized members of mongoManager?
>>> *2.* Isn't it likely that main() runs in a different JVM and a
>>> MongoSpout will be in another JVM? How would Storm access mongoManager?
>>> Using Netty?
>>> *3.* (optional help) I have the Storm source code. Could anyone point
>>> me to the part that Storm does the inter-worker communication for accessing
>>> class references?
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>
>


-- 
Regards,
Navin

Mime
View raw message