whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Drexler <Hans.Drex...@HumanInference.com>
Subject RE: Setting Hadoop head size
Date Fri, 16 Dec 2011 15:03:11 GMT
Thanks for your reply. We installed munin on the nodes and noticed that not all memory was
being used by hadoop. So we think we can make it faster by allocating more memory. But maybe
we have been changing the wrong paremeters.

By the way, technically we did succeed in increasing the HADOOP_HEAPSIZE setting by changing
the whirr-cdh-0.6.0-incubating.jar file inside the jar. But I still have the feeling we do
it the "wrong" way.

PPS. We use this home-grown shell script to deploy munin on all hadoop nodes. It reads a file
cluster-nodes.txt that must contain ip-addres (or full host name) and password of each node.
Munin helps us keeping tabs on what goes on at the cluster nodes. Maybe somebody can use it.
Any remarks welcome (yes, I know sshpass is dirty!)

while read line; do
  set -- $line
  hostname=`echo $host | tr '.' "-"`
  echo "Installing munin node on $host"
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no /usr/bin/aptitude -y install
munin-node < /dev/null
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no '/bin/echo "allow ^50\.57\.191\.88$"
>> /etc/munin/munin-node.conf' < /dev/null
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no '/usr/sbin/service munin-node
restart' < /dev/null
  echo "Adding $hostname to local config"
  rm "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "[$hostname.localdomain]" > "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "\taddress $host" >> "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "\tuse_node_name yes" >> "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "Done installing on $host"
done < cluster-nodes.txt

Kind regards, 


-----Original Message-----
From: Marco Didonna [mailto:m.didonna86@gmail.com] 
Sent: vrijdag 16 december 2011 15:52
To: user@whirr.apache.org
Subject: Re: Setting Hadoop head size

On 16 December 2011 12:49, Hans Drexler <Hans.Drexler@humaninference.com> wrote:
> We are using Whirr to setup a rackspace cluster to run Hadoop jobs. We use
> the Cloudera Hadoop. Below is our hadoop.properties
> whirr.cluster-name=our_cluster
> whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,6
> hadoop-datanode+hadoop-tasktracker
> whirr.provider=cloudservers-us
> whirr.identity=${env:RACKSPACE_USERNAME}
> whirr.credential=${env:RACKSPACE_API_KEY}
> whirr.hardware-id=6
> whirr.image=49
> whirr.login-user=user
> whirr.private-key-file=/home/user/.ssh/id_rsa_whirr
> whirr.public-key-file=/home/user/.ssh/id_rsa_whirr.pub
> whirr.hadoop-install-function=install_cdh_hadoop
> whirr.hadoop-configure-function=configure_cdh_hadoop
> All is working fine. But now I want to change the hadoop configuration file
> on the nodes. Actually, we want to increase the amount of heap space
> available to Hadoop (HADOOP_HEAPSIZE). So we want to change the
> hadoop-env.sh file on each node.
> My Question is: How can I do that? Do I need to open the
> lib/whirr-cdh-0.6.0-incubating.jar and tweak the contents of that jar, then
> repackage it?
> I hope somebody can share some knowledge on this. Thanks!

The HADOOP_HEAPSIZE environment variable in hadoop-env.sh controls how
much heap space each daemon (datanode, tasktracker etc) is assigned
to. In addition to that task tracker launches separate child JVMs to
run map and reduce tasks in. Each of these child JVM is given by
default 200MB of maximum heap space. You can control this parameter by


You could also use mapred.child.java.opts but it didn't work for me.

I hope this helps.

View raw message