hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: hbase evaluation questions
Date Thu, 15 Jul 2010 21:47:42 GMT
On Wed, Jul 14, 2010 at 1:25 AM, Wayne <wav100@gmail.com> wrote:

> 1) How can hbase be configured for a multi-tenancy model? What are the
> options to create a solid separation of data? In a relational database
> schemas would provide this and in cassandra the keyspace can provide the
> same. Of course we can add the tenancy key to the row key and create tenant
> specific tables/column families but that does not provide the same level of
> confidence of separation. We could also create separate clusters for each
> client, but then that defeats part of the point of going to a distributed
> database cluster to improve overall throughput+utilization across all
> clients. We currently run single MySQL databases for each of our clients
> (1-3 TBs each).
We are just getting underway on building a secure version of HBase with
role-based access control (see the umbrella issue:
https://issues.apache.org/jira/browse/HBASE-1697).  This is being built on
top of secure Hadoop to enforce some access control down to the HDFS layer
and to take advantage of the work done there on secure RPC.  Authentication
is performed via Kerberos.

The idea is to allow configuration of user access rights to specific tables
and possibly column families.  However, this is a big project and, like I
mentioned, just getting underway, so a full implementation is a ways off and
some details may be subject to change.

If you're not allowing execution of arbitrary user code on your cluster (say
via client map-reduce jobs), then using client prefixes for either tables or
row keys seems like the most straightforward approach.  How many clients do
you plan to serve?  Hundreds?  Thousands?  Millions?  This should guide your
implementation strategy as well.  In practical terms, HBase scales very well
with huge tables (shared tables for all clients), but I haven't heard of
clusters running with more than hundreds of tables (discrete per-client sets
of tables), so there may be some scalability limitations to the second
approach that would at least merit some testing.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message