storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abe Oppenheim <abe.oppenh...@gmail.com>
Subject Re: Approach to parallelism
Date Mon, 05 Oct 2015 13:07:49 GMT
Hi All,

Any tips for determining the heap size for node's single JVM?

> On Oct 5, 2015, at 5:25 AM, anshu shukla <anshushukla0@gmail.com> wrote:
> 
> I was also   facing the  same issue of balancing the latency and tradeoff .Got a nice
dicussion here .
> 
> Just one query How we can map -
> 1-no of workers to number of  cores 
> 2-no of slots on one machine to number of cores over that machine
> 
>> On Sun, Oct 4, 2015 at 2:00 AM, Kashyap Mhaisekar <kashyap.m@gmail.com> wrote:
>> Thanks guys.
>> So when you say one jvm per node, then it means that one port say 6700 on each machine
and for that we assign high amount of heap?
>> So in this case, it translates into 5 (5 machines) workers with atleast 4g heap and
all bolts spread across these 5 workers?
>> 
>> Is there a guideline on how should I arrive at parallelism hints of bolts themselves?
I mean, when complete latency at spout is higher but execute latencies at bolts are very very
small...
>> 
>> Will jump into the links right away.
>> 
>> Thanks
>> Kashyap
>> 
>>> On Oct 3, 2015 12:00 PM, "Michael Vogiatzis" <michaelvogiatzis@gmail.com>
wrote:
>>> I will agree with Javier, one JVM per node should eliminate the number of messages
that need to be serialized.
>>> 
>>> For tuning Storm topologies you may find the following links useful:
>>> 
>>> https://gist.github.com/mrflip/5958028
>>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
>>> Talk:
>>> http://demo.ooyala.com/player.html?width=640&height=360&embedCode=Q1eXg5NzpKqUUzBm5WTIb6bXuiWHrRMi&videoPcode=9waHc6zKpbJKt9byfS7l4O4sn7Qn
>>> 
>>> Cheers,
>>> Michael
>>> @mvogiatzis
>>> 
>>> 
>>>> On Sat, 3 Oct 2015 at 14:04 Javier Gonzalez <jagonzal@gmail.com> wrote:
>>>> I would suggest sticking with a single worker per machine. It makes memory
allocation easier and it makes inter-component communication much more efficient. Configure
the executors with your parallelism hints to take advantage of all your availabe CPU cores.
>>>> 
>>>> Regards,
>>>> JG
>>>> 
>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
wrote:
>>>>> Hi,
>>>>> I was trying to come up with an approach to evaluate the parallelism
needed for a topology.
>>>>> 
>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has
one spout and 5 bolts.
>>>>> 
>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per machine
ie 40 workers over all)
>>>>> 2. Each worker spawns one executor per component per worker, it translates
to 6 executors per worker which is 40x6= 240 executors.
>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism
hint  at 40 (total workers), else increase parallelism hint beyond 40 till you hit a number
beyond which there is no more visible performance.
>>>>> 
>>>>> Does this look right?
>>>>> 
>>>>> Thanks
>>>>> Kashyap
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Javier González Nicolini
>>> 
>>> -- 
>>> Michael Vogiatzis
>>> Twitter: @mvogiatzis
>>> http://micvog.com/
> 
> 
> 
> -- 
> Thanks & Regards,
> Anshu Shukla

Mime
View raw message