ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: stack definition and component plugin
Date Tue, 17 Jan 2012 08:23:22 GMT
On Mon, Jan 16, 2012 at 10:51 PM, Vitthal "Suhas" Gogate
<> wrote:
> About Stack definition:
> -- Look at the example stack at
> "./controller/src/main/resources/org/apache/ambari/stacks/puppet1-0.json".
> It has global variables defined that can be substituted throughout the
> stack configuration. Ambari follows ruby template language.

Base on the puppet1-0.json, the actual configuration looks like this:

	            "@value":"hdfs://<%= ambari_namenode_host %>:<%=
ambari_namenode_port %>"
                     "@value":"hdfs://<%= ambari_namenode_host %>:<%=
ambari_namenode_port %>/hbase"

It looks like Ambari is asking admin to write code in configuration.
I think this is more complicated than necessary for admin.  This type
of embedded string should be part of template rather than exposed
configuration to administrators.  Branch-0.0 has the separation to
make programmer config independent of admin configurations.

> -- There are few global variables are that are hidden or by default defined
> by Ambari such as
>    -- names of master hosts  e.g. ambari_namenode_host,
> ambari_jobtracker_host (format is  ambari_<rolename>_host).
>    -- ambari_cluster_name
>     These default global variables are derived by ambari from cluster
> definition and they would be documented for Ambari users.

It looks like the hidden variables are the important field to ask the
admin rather than the template configurations.

> About component plugins
>   -- They are not user level plugins, they are supposed to be added by
> service providers or component developers.  When you talk about
> scalability, you mean execution performance or in terms of adding new
> component to Ambari? How many new Hadoop components you think would be
> added over next decade excluding ones already in the stack?  1, 10, 100,
> 1000?

In the past year, Hadoop related projects grow from single digit to a
few dozen this year.  Permutation of each plugin should support
becomes O(n!).  Where n is number of components to support.  Plugin
does not seem like a scalable way to manage components because the
deployment permutation should be on demand O(1).

> About synchronization of services,
>  -- As we talked once, I would plus one for avoiding any synchronization
> while deploying and configuring. We should mandate services to tolerate if
> dependent services are not running. In the existing Hadoop stack do you
> know what services need synchronization other than  NN/DN. I heard from few
> HDFS folks that it is a bug if DN goes down when NN is not up and can be
> fixed very easily (if not already).

Jobtracker also depends on name node being up and out of safe mode.
HBase depends on both ZooKeeper, and HDFS.  It is possible to go
through all projects and implement retries to reconnect.  However,
this approach creates a lot network chatters when base level service
is unavailable and all services are retrying causing nodes to use more
tcp connections and add more timeout.  Fundamentally, this approach
doesn't solve orchestration problem but escalates network connection
saturation problems more quickly.

> I would suggest we should have Ambari Meet-up  to talk about some of the
> issues..

I like the idea of having a meet up to resolve some technical
barriers.  Activities on the mailing list is also good for new comers
to participate in our discussions.


> --Suhas
> On Mon, Jan 16, 2012 at 10:08 PM, Eric Yang <> wrote:
>> Hi all,
>> The current hierarchical stack definition is confusing to me.
>> Supposedly, the definition can be expanded and flatten configuration
>> key value pair, and an example of defining namenode url and get
>> inherited in hbase configuration would look like this:
>> {
>>  ...
>>  "components": {
>>    "hdfs": {
>>      "roles": {
>>        "namenode": { /* override one value on the namenode */
>>          "hadoop/hdfs-site": {
>>            "dfs.https.enable": "true",
>>            "": "hdfs://${namenode}:${port}/"
>>          }
>>        }
>>      }
>>    },
>>    "hbase": {
>>      "roles": {
>>        "region-server": {
>>          "hbase/hbase-site": {
>>            "hbase.rootdir":
>> "${components.hdfs.namenode.hadoop/}/hbase"
>>        }
>>      }
>>    }
>>  }
>> }
>> hbase.rootdir is a key for hbase-site.xml, and it should contain key
>> of "" plus additional path for hbase to store data.  In
>> my interpretation of the macro would look like
>> ${components.hdfs.namenode.hadoop/}/hbase.
>> This seems like a utterly awkward method of describing inheritance.
>> Why don't we use a flat name space to remove additional logistics
>> imposed by Ambari.  I agree that the syntax is fully accurate, but it
>> is a larger headache to maintain this hierarchical structure.
>> The second problem is the component plugin architecture sounds good in
>> theory.  I see some scalability issues with this approach.  Each
>> component describes the components that it depends on.  This could
>> interfere with introducing new components.  i.e. Mapreduce component
>> depends on HDFS.  A new component is introduced, and name HBase.  Now,
>> the Mapreduce component needs to update it's dependency to HDFS and
>> HBase and ZooKeeper.  For introducing new component, there is a lot of
>> plugins updates to make the new version work.  The plugin writer also
>> needs to make theoretical assumption that if components X is installed
>> do Y, otherwise do Z.  Conditional assumption in plugin introduces
>> uncertainty and corner cases into the deployment system.  The number
>> of permutation can greatly exceed the logics that is required to
>> handle by the plugin.
>> Instead of using plugin architecture to manage deployment, it would be
>> safer to use a scripting approach to enable power administrator to
>> deploy a stack of software by writing shell script like script to
>> accomplish the deployment tasks.  The recipes scripts can be shared by
>> the community to automate software stack deployment.  This will ensure
>> the scope of Ambari deployment is focus on cross nodes orchestration
>> without having to build bell and whistles which does not scale well in
>> the long term.
>> What do you guys think?
>> regards,
>> Eric

View raw message