whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Conwell <j...@iamjohn.me>
Subject Re: AMIs to use when creating hadoop cluster with whirr
Date Wed, 05 Oct 2011 17:41:10 GMT
Ok, I just realized something that could be the issue.  I looked at the
different hadoop site xml files (core, hdfs, mapred) and compared their
values to the job configuration reported by the hadoop UI for a job that
ran.  And the values in the site xml files do not correspond with the job
configuration values.  For example, the /usr/lib/hadoop/conf/mapred-site.xml
has mapred.reduce.tasks set to 12, but if I look at the job configuration
for any of my completed jobs, this value is set to 1.

It looks like hadoop is reading default configuration values from somewhere
and using them, and not reading from the /usr/lib/hadoop/conf/*-site.xml
files.

Any ideas?


On Wed, Oct 5, 2011 at 10:25 AM, Andrei Savu <savu.andrei@gmail.com> wrote:

> From here:
> http://developer.yahoo.com/hadoop/tutorial/module7.html
>
> "With multiple racks of servers, RPC timeouts may become more frequent.
> The NameNode takes a continual census of DataNodes and their health via
> heartbeat messages sent every few seconds. A similar timeout mechanism
> exists on the MapReduce side with the JobTracker. With many racks of
> machines, they may force one another to timeout because the master node is
> not handling them fast enough. The following options increase the number of
> threads on the master machine dedicated to handling RPC's from slave nodes:"
> ( I think this is also true for AWS)
>
> Proposed solution is:
>
>  <property>
>     <name>dfs.namenode.handler.count</name>
>     <value>40</value>
>   </property>
>   <property>
>     <name>mapred.job.tracker.handler.count</name>
>
>
>     <value>40</value>
>   </property>
>
>
> You can do this in Whirr by specifying:
>
> hadoop-dfs.dfs.namenode.handler.count=40
> hadoop-mapreduce.mapred.job.tracker.handler.count=40
>
> in the .properties file.
>
> Let me know if this works for you. We should probably use something like
> this by default.
>
> -- Andrei Savu
>
>
> On Wed, Oct 5, 2011 at 8:15 PM, Andrei Savu <savu.andrei@gmail.com> wrote:
>
>> Looks like a network congestion issue to me. I don't know how to do this
>> but I would try to increase the heartbeat timeout.
>>
>> Tom any ideas? Have you seen this before on aws?
>>
>> I don't think there is something wrong with the AMI, I suspect there is
>> something wrong with the Hadoop configuration.
>>
>>
>> On Wednesday, October 5, 2011, John Conwell wrote:
>>
>>> It starts with hadoop reporting bocks of data being 'lost', then
>>> individual data nodes stop responding, the individual data nodes get taken
>>> off line, then jobs get killed, then data nodes come back on line and the
>>> data blocks get replicated back out the correct replication factor.
>>>
>>> The end result are about 80% of the time, my hadoop jobs get killed
>>> because some task fails 3 times in a row, but about an hour after the job
>>> gets killed, all data nodes are back online and all data is fully
>>> replicated.
>>>
>>> Before I go rat holing down "why are my data nodes going down", I want to
>>> cover the easy scenarios like "oh yea...your totally misconfigured.  You
>>> should use ABC ami with the cloudera install and config scripts".  Basically
>>> validate if there are any best practices for setting up a cloudera
>>> distribution of hadoop on EC2.
>>>
>>> I know cloudera has created their own AMIs.  Should I be using them?
>>>  Does it matter?
>>>
>>>
>>>
>>> On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu <savu.andrei@gmail.com>wrote:
>>>
>>>> What do you mean by failing? Is the Hadoop daemon shutting down or the
>>>> machine as a whole?
>>>>
>>>> On Wednesday, October 5, 2011, John Conwell wrote:
>>>>
>>>>> I'm having stability issues (data nodes constantly failing under very
>>>>> little load) on the hadoop clusters I'm creating, and I'm trying to figure
>>>>> out the best practice for creating the most stable hadoop environment
on
>>>>> EC2.
>>>>>
>>>>> In order to run the cdh install and config scripts, I'm
>>>>> setting whirr.hadoop-install-function to install_cdh_hadoop, and
>>>>> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using
a
>>>>> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
>>>>> cloudera AMIs as well as the cloudera install and config scripts.
>>>>>
>>>>> Are they any best practices for how to setup a cloudera distribution
of
>>>>> hadoop on EC2?
>>>>>
>>>>> --
>>>>>
>>>>> Thanks,
>>>>> John C
>>>>>
>>>>>
>>>>
>>>> --
>>>> -- Andrei Savu / andreisavu.ro
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>> --
>> -- Andrei Savu / andreisavu.ro
>>
>>
>


-- 

Thanks,
John C

Mime
View raw message