hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: HBase : get(...) vs scan and in-memory table
Date Wed, 11 Sep 2013 18:07:02 GMT
There is no guarantee that your tables are in memory and you can not verify this directly.
HBase will do its best to keep them in memory but its not 100%.
Cache is divided in 3 zones (default cache) and for IN_MEMORY tables HBase allocates 25% of
a cache. If your data does not fit into this 25%
- try increasing block cache size.

>>Is the from-memory or from-disk read transparent to the client?

Yes, absolutely transparent.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

From: Omkar Joshi [Omkar.Joshi@lntinfotech.com]
Sent: Wednesday, September 11, 2013 4:41 AM
To: user@hbase.apache.org
Subject: RE: HBase : get(...) vs scan and in-memory table

Hi JM,

Yes, I have DistributedCache on my mind too but not sure if those tables will be read-only
in future. Besides, I want to check whether with their current size, those can be kept in-memory
in HBase.

Omkar Joshi

-----Original Message-----
From: Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org]
Sent: Wednesday, September 11, 2013 5:06 PM
To: user
Subject: Re: HBase : get(...) vs scan and in-memory table

Hi Omkar,

Your tables T1 and T2 are not so big. are your 100% they can fit in memory?
If yes, then why did you not distribute them to all the nodes in your MR
setup, like on a map format, using distributed cache? Then on your map
code, you will be 100% sure that both tables are local and in memory...


2013/9/11 Omkar Joshi <Omkar.Joshi@lntinfotech.com>

> I'm executing MR over HBase.
> The business logic in the reducer heavily accesses two tables, say T1(40k
> rows) and T2(90k rows). Currently, I'm executing the following steps :
> 1.In the constructor of the reducer class, doing something like this :
> HBaseCRUD hbaseCRUD = new HBaseCRUD();
> HTableInterface t1= hbaseCRUD.getTable("T1",
>                             "CF1", null, "C1", "C2");
> HTableInterface t2= hbaseCRUD.getTable("T2",
>                             "CF1", null, "C1", "C2");
> In the reduce(...)
>  String lowercase = ....;
> /* Start : HBase code */
> /*
> * TRY using get(...) on the table rather than a
> * Scan!
> */
> Scan scan = new Scan();
> scan.setStartRow(lowercase.getBytes());
> scan.setStopRow(lowercase.getBytes());
> /*scan will return a single row*/
> ResultScanner resultScanner = t1.getScanner(scan);
> for (Result result : resultScanner) {
> /*business logic*/
> }
> Though not sure if the above code is sensible in first place, I have a
> question - would a get(...) provide any performance benefit over the scan?
> Get get = new Get(lowercase.getBytes());
> Result getResult = t1.get(get);
> Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
> performance will improve. As per HBase doc., I will have to re-create the
> tables T1 and T2. Please verify the correctness of my understanding :
> public void createTables(String tableName, boolean readOnly,
>             boolean blockCacheEnabled, boolean inMemory,
>             String... columnFamilyNames) throws IOException {
>         // TODO Auto-generated method stub
>         HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>         /* not sure !!! */
>         tableDesc.setReadOnly(readOnly);
>         HColumnDescriptor columnFamily = null;
>         if (!(columnFamilyNames == null || columnFamilyNames.length == 0))
> {
>             for (String columnFamilyName : columnFamilyNames) {
>                 columnFamily = new HColumnDescriptor(columnFamilyName);
>                 /*
>                  * Start : Do these steps ensure that the column
>                  * family(actually, the column data) is in-memory???
>                  */
>                 columnFamily.setBlockCacheEnabled(blockCacheEnabled);
>                 columnFamily.setInMemory(inMemory);
>                 /*
>                  * End : Do these steps ensure that the column
> family(actually,
>                  * the column data) is in-memory???
>                  */
>                 tableDesc.addFamily(columnFamily);
>             }
>         }
>         hbaseAdmin.createTable(tableDesc);
>         hbaseAdmin.close();
>     }
> Once done :
>  1.  How to verify that the columns are in-memory and accessed from there
> and not the disk?
>  2.  Is the from-memory or from-disk read transparent to the client? In
> simple words, do I need to change the HTable access code in my reducer
> class? If yes, what are the changes?
> Regards,
> Omkar Joshi
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

View raw message