hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: hbase evaluation questions
Date Fri, 16 Jul 2010 05:29:12 GMT
We have decided to move forward with client specific tables. We will
probably always have less than 100 clients (10 today) and we have 2 core
tables so 2 x NumClients seems potentially reasonable given your description
below. We have so much data per client that even if we ever might go into
the 100s of tables we might very well set up totally different clusters to
handle different groups of customers.


On Thu, Jul 15, 2010 at 11:47 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> On Wed, Jul 14, 2010 at 1:25 AM, Wayne <wav100@gmail.com> wrote:
> >
> > 1) How can hbase be configured for a multi-tenancy model? What are the
> > options to create a solid separation of data? In a relational database
> > schemas would provide this and in cassandra the keyspace can provide the
> > same. Of course we can add the tenancy key to the row key and create
> tenant
> > specific tables/column families but that does not provide the same level
> of
> > confidence of separation. We could also create separate clusters for each
> > client, but then that defeats part of the point of going to a distributed
> > database cluster to improve overall throughput+utilization across all
> > clients. We currently run single MySQL databases for each of our clients
> > (1-3 TBs each).
> >
> >
> We are just getting underway on building a secure version of HBase with
> role-based access control (see the umbrella issue:
> https://issues.apache.org/jira/browse/HBASE-1697).  This is being built on
> top of secure Hadoop to enforce some access control down to the HDFS layer
> and to take advantage of the work done there on secure RPC.  Authentication
> is performed via Kerberos.
> The idea is to allow configuration of user access rights to specific tables
> and possibly column families.  However, this is a big project and, like I
> mentioned, just getting underway, so a full implementation is a ways off
> and
> some details may be subject to change.
> If you're not allowing execution of arbitrary user code on your cluster
> (say
> via client map-reduce jobs), then using client prefixes for either tables
> or
> row keys seems like the most straightforward approach.  How many clients do
> you plan to serve?  Hundreds?  Thousands?  Millions?  This should guide
> your
> implementation strategy as well.  In practical terms, HBase scales very
> well
> with huge tables (shared tables for all clients), but I haven't heard of
> clusters running with more than hundreds of tables (discrete per-client
> sets
> of tables), so there may be some scalability limitations to the second
> approach that would at least merit some testing.
> --gh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message