giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonja Koenig <>
Subject Re: Giraph Performance Tuning
Date Wed, 26 Aug 2015 12:28:56 GMT
Thank you Eric!

Your information gave me a good inpoint for further configuration :)
It helped immensely!

Also I agree with you. A small amount of powerful workers is quite 
surely better than a large amount of week. Unfortunately I'm just doing 
my bachelor's, so I'm not (yet) in a position to scream for "more power 
pls!!!" ;D
But it's working out nicely, already got some good data.


Am 24.08.2015 um 18:50 schrieb Eric Kimbrel:
> Hello,
> I am not at all an expert, but here¹s what I can tell you:
> I havent used giraph in PURE_YARN MODE, as I¹ve had problems getting it to
> compile correctly, but when
> Using hadoop_2 mapreduce you set memory and cores per worker with the
> hadoop map reduce options.  Those
> Might be different from mine depending on your hadoop version but I use:
> (hadoop 2.6.0)
> As a General rule of thumb I have found that big workers with lots of
> memory and cores will outperform lots of small workers by having reduced
> network IO, but that isn¹t exactly always true. Finding the optimal
> configuration for a give algorithm / hardware can be somewhat of an art
> form.   I would generally find 10 workers with 10 cores and a ample memory
> would outperform 100 workers with a single core, but again that¹s just my
> rule of thumb and not always the case.   You are probably going to want to
> run 9 workers (saving one machine for the master node) with one core and
> 3GB each.
> One of the reasons this is such a hard task is because you have three big
> variables to consider (algorithms, datasets, hardware,
> platforms/frameworks).  Giraph and GraphX have very different execution
> models, and because of that the implementation of the same algorithm on
> each may be very different, and perform very differently.  Implementation
> details can make all the difference.
> Combiners/Aggregators: Combiners and aggregators are completely
> application specific. (Although to use a combiner you do specify it on the
> command line, but some applications will work with them while others
> won¹t).
> Good luck on your thesis work!
> -Eric
> On 8/24/15, 2:18 AM, "Sonja Koenig" <> wrote:
>> Hey there everyone!
>> On the user list, there was noone to help me, so I thought I'll just
>> start bugging devs..
>> I am currently writing my bachelor thesis about Giraph and GraphX, where
>> I am trying to compare their scalability and features and bring them
>> into a context with different graph types.
>> In order to compare the two on a fair basis, I want to tune the
>> frameworks to get the most out of them :-)
>> I was hoping to get some tips and tricks from you all, where I can make
>> some configurations to impact my computations..
>> My set up:
>> 10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -> one is
>> designated master
>> Giraph 1.10
>> Hadoop 1.2.1
>> So far I haven't done any special configurations for hadoop or giraph
>> besides the basic ones during setup.
>> Performance-critical might be these:
>> In *mapred-site.xml*:
>> = 4
>> In *dfs-site.xml*:
>>      dfs.replication=3
>> If I am correctly informed, the default amount of heap is 1000MB, which
>> I haven't changed. I am also not sure where I can actually increase
>> memory usage. Any advice?
>> Also, I read somewhere that it is smarter to increase the amount of
>> threads per worker and not the amount of worker per machine? But I am
>> anyways somewhat handicapped with only one core per machine..
>> Lastly, has anyone noticed any performance changes when using
>> checkointing, combiners, aggregators and so on?
>> Is the use of combiners and aggregators a choice of the application code
>> or my execution command?
>> I would appreciate any advice and comments greatly! :-)
>> Greetings from Ulm,
>> Sonja

View raw message