hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Re: hbase on EC2 - any guidelines for instance size selection?
Date Fri, 19 Dec 2008 14:01:10 GMT
Thank you for your advice

So then I need to really look at what memory footprint the custom MR
jobs I run need to determine the "jobs per node" right?
E.g. with 7G per node, minus the 2G reserved, if I need jobs with
-Xmx1G I can run max 5, but safely 4... sound reasonable?


Reasoning...
I do a fair bit of geospatial cross referencing, so am building in
memory indexes for the Maps to use (Hbase provides point style data,
but I often need to cross reference with a set of Polygons to
preprocess stuff for mapping etc).  So I am always having to watch in
memory index size and blowing heap.  Additionally I am reusing JVMs
(since Hadoop 0.19.0) since in memory index generation is time
consuming.
(http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html)

If not EC2, do people typically rent rack space by the month? - anyone
suggest a good provider for say 20 nodes?

Cheers,

Tim

On Fri, Dec 19, 2008 at 2:38 PM, Andrew Purtell <apurtell@yahoo.com> wrote:
> Hi Tim,
>
> I think a basic requirement is extra large instances,
> assuming you will be running HBase regionservers alongside
> your tasktrackers (and therefore mapred tasks), and also
> alongside DFS data nodes. I believe this is the most
> common configuration due to the benefit of local i/o and
> best use of allocated nodes.
>
> HBase regionservers are heap intensive applications, and
> should have 1G reserved for them alone. Datanodes should
> also have 1G heap. Then you need to consider the RAM load
> of the remaining tasks.
>
>   - Andy
>
>> From: tim robertson
>> Subject: hbase on EC2 - any guidelines for instance size
>> selection?
>> To: hbase-user@hadoop.apache.org
>> Date: Friday, December 19, 2008, 5:22 AM
>> Hi,
>>
>> I have been using EC2 for various MR jobs, and when I am
>> doing this I can pretty much determine what EC2 instance
>> size will best meet my needs (e.g. large lookup memory
>> indexes in Map requires large instance, low memory
>> processing intensive stuff happy with many small
>> etc) and how many jobs per node etc but for HBase I am
>> not sure what MR it is really going to run underneath...
>> Are there any rules of thumb for picking EC2 instance
>> types for HBase usage?
> [...]
>
>
>
>
>

Mime
View raw message