storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Sharma <ping2r...@gmail.com>
Subject Re: Approach to parallelism
Date Tue, 06 Oct 2015 08:00:09 GMT
Nick,
Look into your queue sizing. Both network bound and in memory.

Also i also try to use this pattern

say i have Spout S1 and two bolts B1 and B2 doing something for it. (S1 ->
B1-> B2)

lets say i have to run bolts in parallel (means 2 instance of B1 and two
instance of B2 )

and assume i have 2 workers available on two different nodes

then i try to run S1 also 2 times(if my spout source allow concurrency)

that S1 -> B1 -> B2  will be in one JVM
and S2 -> B1 -> B2 in another JVM, that decreases network bound message
from one JVM to other.

So if you have 6 JVMs running, i say start with 6 spouts.

This is simple scenario, topologies can be complex and you can tweak this
rule little bit to fit your use case.

Thanks
Ravi.


On Mon, Oct 5, 2015 at 4:29 PM, Nick R. Katsipoulakis <nick.katsip@gmail.com
> wrote:

> Hello guys,
>
> This is a really interesting discussion. I am also trying to fine-tune the
> performance of my cluster and especially my end-to-end-latency which ranges
> from 200-1200 msec for a topology with 2 spouts (each one with 2k tuples
> per second input rate) and 3 bolts. My cluster consists of 3 zookeeper
> nodes (1 shared with nimbus) and 6 supervisor nodes, all of them being AWS
> m4.xlarge instances.
>
> I am pretty sure that the latency I am experiencing is ridiculous and I
> currently have no ideas what to do to improve that. I have 3 workers per
> node, which I will drop it to one worker per node after this discussion and
> see if I have better results.
>
> Thanks,
> Nick
>
> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
> wrote:
>
>> Anshu,
>> My methodology was as follows. Since the true parallelism of a machine is
>> the the no. of cores, I set the workers equal to no. of cores. (5 in my
>> case). That being said, since we have 32 GB per box, we usually leave 50%
>> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap
>> at 3g.
>>
>> This was before Javiers and Michaels suggestion of keeping one JVM per
>> node...
>>
>> Ours is a single topology running on the boxes and hence I would be
>> changing it to one JVM (worker) per box and rerunning.
>>
>> Thanks
>> Kashyap
>>
>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <anshushukla0@gmail.com>
>> wrote:
>>
>>> Sorry for reposting !! Any suggestions Please .
>>>
>>> Just one query How we can map -
>>> *1-no of workers to number of  cores *
>>> *2-no of slots on one machine to number of cores over that machine*
>>>
>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <soozandjohnyost@gmail.com>
>>> wrote:
>>>
>>>> Hi Javier,
>>>>
>>>> Gotcha, I am seeing the same thing, and I see a ton of worker restarts
>>>> as well.
>>>>
>>>> Thanks
>>>>
>>>> --John
>>>>
>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <jagonzal@gmail.com>
>>>> wrote:
>>>>
>>>>> I don't have numbers, but I did see a very noticeable degradation of
>>>>> throughput and latency when using multiple workers per node with the
same
>>>>> topology.
>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <soozandjohnyost@gmail.com>
wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I am curious--are there any benchmark numbers that demonstrate how
>>>>>> much better one worker per node is?  The reason I ask is that I may
need to
>>>>>> double up the workers on my cluster and I was wondering how much
of a
>>>>>> throughput hit I may take from having two workers per node.
>>>>>>
>>>>>> Any info would be very much appreciated--thanks! :)
>>>>>>
>>>>>> --John
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <jagonzal@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I would suggest sticking with a single worker per machine. It
makes
>>>>>>> memory allocation easier and it makes inter-component communication
much
>>>>>>> more efficient. Configure the executors with your parallelism
hints to take
>>>>>>> advantage of all your availabe CPU cores.
>>>>>>>
>>>>>>> Regards,
>>>>>>> JG
>>>>>>>
>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <
>>>>>>> kashyap.m@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I was trying to come up with an approach to evaluate the
>>>>>>>> parallelism needed for a topology.
>>>>>>>>
>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my
topology
>>>>>>>> has one spout and 5 bolts.
>>>>>>>>
>>>>>>>> 1. Define one worker port per CPU to start off. (= 8 workers
per
>>>>>>>> machine ie 40 workers over all)
>>>>>>>> 2. Each worker spawns one executor per component per worker,
it
>>>>>>>> translates to 6 executors per worker which is 40x6= 240 executors.
>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave
>>>>>>>> parallelism hint  at 40 (total workers), else increase parallelism
hint
>>>>>>>> beyond 40 till you hit a number beyond which there is no
more visible
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> Does this look right?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Kashyap
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Javier González Nicolini
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anshu Shukla
>>>
>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD student
>

Mime
View raw message