whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Conwell <j...@iamjohn.me>
Subject Re: AMIs to use when creating hadoop cluster with whirr
Date Wed, 05 Oct 2011 20:25:23 GMT
I thought about that, but the hadoop-site.xml created by whirr has some of
the info needed, but its not the full set of xml elements that get written
to the *-site.xml files on the hadoop cluster.   For example whirr sets *
mapred.reduce.tasks* based on the number task trackers, which is vital for
the job configuration to have.  But the hadoop-size.xml doesnt have this
value.  It only has the core properties needed to allow you to use the ssh
proxy to interact with the name node and job tracker


On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <savu.andrei@gmail.com> wrote:

> The files are also created on the local machine in ~/.whirr/cluster-name/
> so it shouldn't be that hard. The only tricky part is to match the Hadoop
> version from my point of view.
>
> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <john@iamjohn.me> wrote:
>
>> This whole scenario does bring up the question about how people handle
>> this kind of scenario.  To me the beauty of whirr is that it means I can
>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>> problem, gather my data, kill the cluster, workflow goes on.
>>
>> But if my workflow requires the contents of three little files located on
>> a different machine, in a different cluster, and possible a different cloud
>> vendor, that really puts a damper on the whimsical on-the-flyness of
>> creating hadoop resources only when needed.  I'm curious how other people
>> are handling this scenario.
>>
>>
>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <savu.andrei@gmail.com>wrote:
>>
>>> Awesome! I'm glad we figured this out, I was getting worried that we have
>>> a critical bug.
>>>
>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <john@iamjohn.me> wrote:
>>>
>>>> Ok...I think I figured it out.  This email thread made me take a look at
>>>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links
a
>>>> bunch of jobs together in a workflow, is on a different machine than the
>>>> cluster that hadoop is running on.  This means when I create a new
>>>> Configuration() object it, it tries to load the default hadoop values from
>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>> the default vales...config for suck.
>>>>
>>>> So I copied the *-site.xml files from my namenode over to the machine my
>>>> hadoop job driver was running from and put it in the class path, and
>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <savu.andrei@gmail.com>wrote:
>>>>
>>>>>
>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <john@iamjohn.me>
wrote:
>>>>>
>>>>>> It looks like hadoop is reading default configuration values from
>>>>>> somewhere and using them, and not reading from
>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>
>>>>>
>>>>> If you are running CDH the config files are in:
>>>>>
>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> John C
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Mime
View raw message