storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kashyap Mhaisekar <kashya...@gmail.com>
Subject Re: Approach to parallelism
Date Mon, 05 Oct 2015 16:05:33 GMT
I changed it back before I could look at capacity. Will check again.
My use case is compute bound for sure because what happens is one tuple to
my spout creates 20 more tuples downstream with each one iterating for 500
times. And I get tuples at high TPS rates.

On Mon, Oct 5, 2015 at 10:34 AM, Nick R. Katsipoulakis <
nick.katsip@gmail.com> wrote:

> Well, did you see the capacity of your bolts also increasing? Because if
> the former is true, then that means that your use-case is compute-bound and
> you end up having increased latency for your workers.
>
> Nick
>
> On Mon, Oct 5, 2015 at 11:32 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
> wrote:
>
>> After dropping to 1 worker per node, my latency was bad.... Probably my
>> use case was different.
>>
>> On Mon, Oct 5, 2015 at 10:29 AM, Nick R. Katsipoulakis <
>> nick.katsip@gmail.com> wrote:
>>
>>> Hello guys,
>>>
>>> This is a really interesting discussion. I am also trying to fine-tune
>>> the performance of my cluster and especially my end-to-end-latency which
>>> ranges from 200-1200 msec for a topology with 2 spouts (each one with 2k
>>> tuples per second input rate) and 3 bolts. My cluster consists of 3
>>> zookeeper nodes (1 shared with nimbus) and 6 supervisor nodes, all of them
>>> being AWS m4.xlarge instances.
>>>
>>> I am pretty sure that the latency I am experiencing is ridiculous and I
>>> currently have no ideas what to do to improve that. I have 3 workers per
>>> node, which I will drop it to one worker per node after this discussion and
>>> see if I have better results.
>>>
>>> Thanks,
>>> Nick
>>>
>>> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
>>> wrote:
>>>
>>>> Anshu,
>>>> My methodology was as follows. Since the true parallelism of a machine
>>>> is the the no. of cores, I set the workers equal to no. of cores. (5 in my
>>>> case). That being said, since we have 32 GB per box, we usually leave 50%
>>>> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap
>>>> at 3g.
>>>>
>>>> This was before Javiers and Michaels suggestion of keeping one JVM per
>>>> node...
>>>>
>>>> Ours is a single topology running on the boxes and hence I would be
>>>> changing it to one JVM (worker) per box and rerunning.
>>>>
>>>> Thanks
>>>> Kashyap
>>>>
>>>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <anshushukla0@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry for reposting !! Any suggestions Please .
>>>>>
>>>>> Just one query How we can map -
>>>>> *1-no of workers to number of  cores *
>>>>> *2-no of slots on one machine to number of cores over that machine*
>>>>>
>>>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <soozandjohnyost@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Javier,
>>>>>>
>>>>>> Gotcha, I am seeing the same thing, and I see a ton of worker
>>>>>> restarts as well.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> --John
>>>>>>
>>>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <jagonzal@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't have numbers, but I did see a very noticeable degradation
of
>>>>>>> throughput and latency when using multiple workers per node with
the same
>>>>>>> topology.
>>>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <soozandjohnyost@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Everyone,
>>>>>>>>
>>>>>>>> I am curious--are there any benchmark numbers that demonstrate
how
>>>>>>>> much better one worker per node is?  The reason I ask is
that I may need to
>>>>>>>> double up the workers on my cluster and I was wondering how
much of a
>>>>>>>> throughput hit I may take from having two workers per node.
>>>>>>>>
>>>>>>>> Any info would be very much appreciated--thanks! :)
>>>>>>>>
>>>>>>>> --John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <jagonzal@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I would suggest sticking with a single worker per machine.
It
>>>>>>>>> makes memory allocation easier and it makes inter-component
communication
>>>>>>>>> much more efficient. Configure the executors with your
parallelism hints to
>>>>>>>>> take advantage of all your availabe CPU cores.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> JG
>>>>>>>>>
>>>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <
>>>>>>>>> kashyap.m@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I was trying to come up with an approach to evaluate
the
>>>>>>>>>> parallelism needed for a topology.
>>>>>>>>>>
>>>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb.
And my
>>>>>>>>>> topology has one spout and 5 bolts.
>>>>>>>>>>
>>>>>>>>>> 1. Define one worker port per CPU to start off. (=
8 workers per
>>>>>>>>>> machine ie 40 workers over all)
>>>>>>>>>> 2. Each worker spawns one executor per component
per worker, it
>>>>>>>>>> translates to 6 executors per worker which is 40x6=
240 executors.
>>>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then
leave
>>>>>>>>>> parallelism hint  at 40 (total workers), else increase
parallelism hint
>>>>>>>>>> beyond 40 till you hit a number beyond which there
is no more visible
>>>>>>>>>> performance.
>>>>>>>>>>
>>>>>>>>>> Does this look right?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Kashyap
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Javier González Nicolini
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anshu Shukla
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nikolaos Romanos Katsipoulakis,
>>> University of Pittsburgh, PhD student
>>>
>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD student
>

Mime
View raw message