hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Speeding up Scans
Date Wed, 25 Jan 2012 12:56:02 GMT
Hello all,

I am looking for advice on speeding up my Scanning.

I want to iterate over all rows where a particular column (language) 
equals a particular value ("JA").

I am already creating my row keys using that column in the first bytes.  
And I do my scans using partial row matching, like this...

     public static byte[] calculateStartRowKey(String language) {
         int languageHash = language.length() > 0 ? language.hashCode() : 0;
         byte[] language2 = Bytes.toBytes(languageHash);
         byte[] accountID2 = Bytes.toBytes(0);
         byte[] timestamp2 = Bytes.toBytes(0);
         return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
     }

     public static byte[] calculateEndRowKey(String language) {
         int languageHash = language.length() > 0 ? language.hashCode() : 0;
         byte[] language2 = Bytes.toBytes(languageHash + 1);
         byte[] accountID2 = Bytes.toBytes(0);
         byte[] timestamp2 = Bytes.toBytes(0);
         return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
     }

     Scan scan = new Scan(calculateStartRowKey(language), 
calculateEndRowKey(language));


Since I am using a hash value for the string, I need to re-check the 
column to make sure that some other string does not get the same hash value

     Filter filter = new SingleColumnValueFilter(resultFamily, 
languageCol, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language));
     scan.setFilter(filter);

I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on EC2.

I think that this should be really fast, but it is not.  Any advice on 
how to debug/speed it up?

Thanks
Peter





Mime
View raw message