hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Is there a problem with having 4000 tables in a cluster?
Date Tue, 24 Sep 2013 21:50:01 GMT
Hi Jeremy,

I don't see any issue for HBase to handle 4000 tables. However, I don't
think it's the best solution for your use case.


2013/9/24 jeremy p <athomewithagroovebox@gmail.com>

> Short description : I'd like to have 4000 tables in my HBase cluster.  Will
> this be a problem?  In general, what problems do you run into when you try
> to host thousands of tables in a cluster?
> Long description : I'd like the performance advantage of pre-split tables,
> and I'd also like to do filtered range scans.  Imagine a keyspace where the
> key consists of : [POSITION]_[WORD] , where POSITION is a number from 1 to
> 4000, and WORD is a string consisting of 96 characters.  The value in the
> cell would be a single integer.  My app will examine a 'document', where
> each 'line' consists of 4000 WORDs.  For each WORD, it'll do a filtered
> regex lookup.  Only problem?  Say I have 200 mappers and they all start at
> POSITION 1, my region servers would get hotspotted like crazy. So my idea
> is to break it into 4000 tables (one for each POSITION), and then pre-split
> the tables such that each region gets an equal amount of the traffic.  In
> this scenario, the key would just be WORD.  Dunno if this a bad idea, would
> be open to suggestions
> Thanks!
> --J

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message