hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Re: Speeding up Scans
Date Wed, 25 Jan 2012 15:06:33 GMT
Ah ha!  I appear to be insane ;-)

Adding the following speeded things up quite a bit

         scan.setCacheBlocks(true);
         scan.setCaching(1000);

Thank you, it was a duh!

P



On 1/25/12 8:13 AM, Doug Meil wrote:
> Hi there-
>
> Quick sanity check:  what caching level are you using?  (default is 1)  I
> know this is basic, but it's always good to double-check.
>
> If "language" is already in the lead position of the rowkey, why use the
> filter?
>
> As for EC2, that's a wildcard.
>
>
>
>
>
> On 1/25/12 7:56 AM, "Peter Wolf"<opus111@gmail.com>  wrote:
>
>> Hello all,
>>
>> I am looking for advice on speeding up my Scanning.
>>
>> I want to iterate over all rows where a particular column (language)
>> equals a particular value ("JA").
>>
>> I am already creating my row keys using that column in the first bytes.
>> And I do my scans using partial row matching, like this...
>>
>>      public static byte[] calculateStartRowKey(String language) {
>>          int languageHash = language.length()>  0 ? language.hashCode() :
>> 0;
>>          byte[] language2 = Bytes.toBytes(languageHash);
>>          byte[] accountID2 = Bytes.toBytes(0);
>>          byte[] timestamp2 = Bytes.toBytes(0);
>>          return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>>      }
>>
>>      public static byte[] calculateEndRowKey(String language) {
>>          int languageHash = language.length()>  0 ? language.hashCode() :
>> 0;
>>          byte[] language2 = Bytes.toBytes(languageHash + 1);
>>          byte[] accountID2 = Bytes.toBytes(0);
>>          byte[] timestamp2 = Bytes.toBytes(0);
>>          return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>>      }
>>
>>      Scan scan = new Scan(calculateStartRowKey(language),
>> calculateEndRowKey(language));
>>
>>
>> Since I am using a hash value for the string, I need to re-check the
>> column to make sure that some other string does not get the same hash
>> value
>>
>>      Filter filter = new SingleColumnValueFilter(resultFamily,
>> languageCol, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language));
>>      scan.setFilter(filter);
>>
>> I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on
>> EC2.
>>
>> I think that this should be really fast, but it is not.  Any advice on
>> how to debug/speed it up?
>>
>> Thanks
>> Peter
>>
>>
>>
>>
>>
>


Mime
View raw message