storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Weathers <eweath...@groupon.com>
Subject Re: How long until fields grouping gets overwhelmed with data?
Date Thu, 11 Aug 2016 10:22:07 GMT
I think these are the appropriate code pointers:

Original Clojure-based storm-core:

https://github.com/apache/storm/blob/v0.9.6/storm-core/src/clj/backtype/storm/daemon/executor.clj#L36-L39


New Java-based storm-core:

https://github.com/apache/storm/blob/3b1ab3d8a7da7ed35adc448d24f1f1ccb6c5ff27/storm-core/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L157-L161


On Thu, Aug 11, 2016 at 2:57 AM, Navin Ipe <navin.ipe@searchlighthealth.com>
wrote:

> True, but that's what I wanted to confirm by mentioning spout S1 and S2.
> Will S1 and S2 use their own n mod hash functions or is it a common
> function decided by Storm? (If anyone could offer a pointer on where I
> could find this in the Storm source code, I could try finding it myself too)
>
> On Thu, Aug 11, 2016 at 2:36 PM, Gireesh Ramji <gireeshramji@yahoo.com>
> wrote:
>
>> It does not matter who hashes it as long as they all use the same hash
>> function it will go to the same bolt
>>
>>
>> ------------------------------
>> *From:* Navin Ipe <navin.ipe@searchlighthealth.com>
>> *To:* user@storm.apache.org
>> *Sent:* Thursday, August 11, 2016 4:56 PM
>> *Subject:* Re: How long until fields grouping gets overwhelmed with data?
>>
>> If the hash is dynamically computed and is stateless, then that brings up
>> one more question.
>>
>> Let's say there are two spout classes S1 and S2. I create 10 tasks of S1
>> and 10 tasks of S2.
>> There are 10 tasks of a bolt B.
>>
>> S1 and S2 are fieldsGrouped with B.
>>
>> I receive data x in S1 and another data x in S2.
>>
>> If S1's emit of x goes to task1 of B, then will S2's emit of x also go to
>> task1 of B?
>>
>> *Basically the question is: *Is the hash value decided by the Spout or
>> by Storm? Because if it is decided by the spout, then S1's emit of x can go
>> to task 1 but S2's emit of x might go to some other task of the bolt, and
>> that won't serve the purpose of someone who wants all x'es to go to one
>> bolt.
>>
>>
>>
>>
>> On Wed, Aug 10, 2016 at 8:58 PM, Navin Ipe <navin.ipe@searchlighthealth.c
>> om> wrote:
>>
>> Oh that's good to know. I assume it works like this: https://en.wikipedia.org/wiki/
>> Hash_function#Hashing_ uniformly_distributed_data
>> <https://en.wikipedia.org/wiki/Hash_function#Hashing_uniformly_distributed_data>
>>
>> On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncleung@gmail.com> wrote:
>>
>> It's based on a modulo of a hash of the field. The fields grouping is
>> stateless.
>>
>> On Aug 10, 2016 8:18 AM, "Navin Ipe" <navin.ipe@searchlighthealth.c om
>> <navin.ipe@searchlighthealth.com>> wrote:
>>
>> Hi,
>>
>> For spouts to be able to continuously send a fields grouped tuple to the
>> same bolt, it would have to store a key value map something like this,
>> right?
>>
>> field1023 ---> Bolt1
>> field1343 ---> Bolt3
>> field1629 ---> Bolt5
>> field1726 ---> Bolt1
>> field1481 ---> Bolt3
>>
>> So if my topology runs for a very long time and the spout generates many
>> unique field values, won't this key value map run out of memory eventually?
>>
>> OR is there a failsafe or a map limit that Storm has to handle this
>> without crashing?
>>
>> If memory problems could happen, what would be an alternative way to
>> solve this problem where many unique fields could get generated over time?
>>
>> --
>> Regards,
>> Navin
>>
>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>>
>>
>
>
> --
> Regards,
> Navin
>

Mime
View raw message