hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeremy p <athomewithagroove...@gmail.com>
Subject Is there a problem with having 4000 tables in a cluster?
Date Tue, 24 Sep 2013 21:34:24 GMT
Short description : I'd like to have 4000 tables in my HBase cluster.  Will
this be a problem?  In general, what problems do you run into when you try
to host thousands of tables in a cluster?

Long description : I'd like the performance advantage of pre-split tables,
and I'd also like to do filtered range scans.  Imagine a keyspace where the
key consists of : [POSITION]_[WORD] , where POSITION is a number from 1 to
4000, and WORD is a string consisting of 96 characters.  The value in the
cell would be a single integer.  My app will examine a 'document', where
each 'line' consists of 4000 WORDs.  For each WORD, it'll do a filtered
regex lookup.  Only problem?  Say I have 200 mappers and they all start at
POSITION 1, my region servers would get hotspotted like crazy. So my idea
is to break it into 4000 tables (one for each POSITION), and then pre-split
the tables such that each region gets an equal amount of the traffic.  In
this scenario, the key would just be WORD.  Dunno if this a bad idea, would
be open to suggestions



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message