whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Savu <savu.and...@gmail.com>
Subject Re: AMIs to use when creating hadoop cluster with whirr
Date Wed, 05 Oct 2011 20:11:33 GMT
The files are also created on the local machine in ~/.whirr/cluster-name/ so
it shouldn't be that hard. The only tricky part is to match the Hadoop
version from my point of view.

On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <john@iamjohn.me> wrote:

> This whole scenario does bring up the question about how people handle this
> kind of scenario.  To me the beauty of whirr is that it means I can spin up
> and down hadoop clusters on the fly when my workflow demands it.  If a task
> gets q'd up that needs mapreduce, I spin up a cluster, solve my problem,
> gather my data, kill the cluster, workflow goes on.
>
> But if my workflow requires the contents of three little files located on a
> different machine, in a different cluster, and possible a different cloud
> vendor, that really puts a damper on the whimsical on-the-flyness of
> creating hadoop resources only when needed.  I'm curious how other people
> are handling this scenario.
>
>
> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <savu.andrei@gmail.com>wrote:
>
>> Awesome! I'm glad we figured this out, I was getting worried that we have
>> a critical bug.
>>
>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <john@iamjohn.me> wrote:
>>
>>> Ok...I think I figured it out.  This email thread made me take a look at
>>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
>>> bunch of jobs together in a workflow, is on a different machine than the
>>> cluster that hadoop is running on.  This means when I create a new
>>> Configuration() object it, it tries to load the default hadoop values from
>>> the class path, but since the driver isnt running on the hadoop cluster and
>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>> the default vales...config for suck.
>>>
>>> So I copied the *-site.xml files from my namenode over to the machine my
>>> hadoop job driver was running from and put it in the class path, and
>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>
>>>
>>>
>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <savu.andrei@gmail.com>wrote:
>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <john@iamjohn.me> wrote:
>>>>
>>>>> It looks like hadoop is reading default configuration values from
>>>>> somewhere and using them, and not reading from
>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>
>>>>
>>>> If you are running CDH the config files are in:
>>>>
>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

Mime
View raw message