hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wim Van Leuven" <wim.vanleu...@highestpoint.biz>
Subject RE: Impromptu HBase survey
Date Thu, 05 Nov 2009 14:43:03 GMT
Evenly interesting would be to know what type of data you are storing. I
mean, if I store web crawled data in my hbase it doesn't matter if I miss or
lose some or more pages, does it? I'll crawl it next time.
Is the data or every element of it business critical? Is it derived data
from some other source? Aggregated data? Or do we store traditional online

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: donderdag 5 november 2009 11:14
To: hbase-user@hadoop.apache.org
Subject: Re: Impromptu HBase survey

We don't run HBase in operational mode yet, but researching it with a
goal of moving towards there...

> To give you an idea of questions that I wonder about:
> *        Are you using a natural or synthetic key?

- synthetic.  UUID but considering an encoded uuid to shorten it.
Would like to see some KeyUtils classes in the HBase library, or some
recommendations.  I'd like an 8 char synthetic key ideally, but
haven't found a good way to do this yet (lack of time).

> *        Are you using HBase index tables or maintaining your own?

- lucene, but will use hbase index tables

> *        Do you have multiple data tables in your HBase server?

- yes, but only for convenience of keeping the 2 small tables with the big

> *        How many rows of data are in each HBase table?

- 200 million.  When operation, will expect to grow at 5-10%/month and
expect columns to grow at 10% or so per month also

> *        What type of data are you storing in each record?

- 30-60 fields of INT and String
- might be putting in PNGs in a new table to represent google map tiles

> *        Are you using column families to localize data or store
name/value pairs?

- no

> *        Are there columns like name, address, etc., that are present in
each row?

- no (http://rs.tdwg.org/dwc/terms/index.htm is our term vocabulary)

> *        Are you running HBase on your own servers or on Amazon EC2?

- in house 10 node cluster with 3 masters

> *        Are you using Hadoop to run map/reduce functions against HBase?

- progressing towards this.  Still using text file exports from a
mysql DB in Hadoop, as HBase is not in production mode yet

> *        How does your client interact with HBase?  Java API, REST,
Stargate, Thrift, other (please specify), etc.

- JavaAPI

View raw message