hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Combining hadoop dataNode process with hbase into single JVM process
Date Thu, 10 Dec 2009 23:37:44 GMT
2009/12/10 Michał Podsiadłowski <podsiadlowski@gmail.com>:
> Hi all!
>
> Sorry for duplicating message from hadoop list but I think not all of you
> are reading that one and I really need to know your opinion.
>
> I have been recently experimenting with hadoop and hbase for my company. And
> after some tests we decided to set up a 4 node cluster for our application
> with hbase as a persistence layer for our webcache as a proove of concept +
> some more elaborated usages planed for future if this will happen to work.
>  Unfortunately our IT department would like to have whole hadoop + hbase +
> zookeeper + ..  as one jvm process to be able to monitor it easily and
> what's more important limit max memory more accurately. They are afraid that
> with multiple processes each JVM can go slightly beyond their limits as it
> tends to do and node will became less stable (swapping and etc) as it would
> be with one process limited to sum of all bounds. We are unfortunately
> limited by resources and on dataNodes there will be also some other app
> running, so hadoop can't use all resources which are though quite generous -
> 8GB.
> Is it sensible to connect those usually separated process and what can
> potential go wrong when all there processes will be started programatically
> in one JVM? Are there any obvious contraindication for not going this way?
>
> We are planning to setup for our namenode failover as described by cloudera
> using linux ha features which are used currently for mysql replication and
> failover - unfortunately again
> namenode won't be alone on the machine it will be sharing it with mysql but
> those 2 nodes are better equipped - 10GB. Here comes another question -  can
> some advice me how much memory is required for NameNode baring in mind that
> our data won't exceed for sure ~300GB (again this is only a prove of concept
> for now).
>
> Thanks,
> Michael.
>
As for lumping all the daemons together, it is technically, possible,
but they were designed to run separately so that they are easier
manage. It would probably would make troubleshooting harder. Which sub
process is causing high cpu. Just because the TaskTracker fails does
not mean that the DataNode should be down. It is a cool idea though.

If you want some monitoring hints I'll plug my presentation
http://www.slideshare.net/cloudera/hw09-monitoring-best-practices
http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/

Mime
View raw message