hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <Omkar.Jo...@lntinfotech.com>
Subject Speeding up the row count
Date Wed, 17 Apr 2013 09:47:28 GMT
Hi,

I'm having two tables - CUSTOMERS(60000 + rows) and PRODUCTS(1000851 rows).

The table structures are  :

CUSTOMERS
rowkey :                       CUSTOMER_ID

column family : CUSTOMER_INFO

            columns :          NAME
                                    EMAIL
                                    ADDRESS
                                    MOBILE


PRODUCTS
rowkey :                       PRODUCT_ID

column family : PRODUCT_INFO

            columns : NAME
                                    CATEGORY
                                    GROUP
                                    COMPANY
                                    COST
                                    COLOR

I'm trying to get the row count for each table using the following snippet :
.
.
.
hbaseCRUD.getTableCount(args[1], "CUSTOMER_INFO","NAME");
.
.
hbaseCRUD.getTableCount(args[1], "PRODUCT_INFO","NAME");

public long getTableCount(String tableName, String columnFamilyName,
                  String columnName) {
            AggregationClient aggregationClient = new AggregationClient(config);
            Scan scan = new Scan();
            scan.addFamily(Bytes.toBytes(columnFamilyName));
            if (columnName != null && !columnName.isEmpty()) {
                  scan.addColumn(Bytes.toBytes(columnFamilyName),
                              Bytes.toBytes(columnName));
            }

            long rowCount = 0;
            try {
                  rowCount = aggregationClient.rowCount(Bytes.toBytes(tableName),
                              null, scan);
            } catch (Throwable e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
            }
            System.out.println("row count is " + rowCount);

            return rowCount;
      }

For CUSTOMERS, the response is acceptable but for PRODUCTS, it is timing-out(even on the shell
1000851 row(s) in 258.9220 seconds).

What needs to be done to get a response quickly? Approach other than AggregationClient or
tweaking the Scan in the above code snippet?

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information
for the intended recipient(s). Unintended recipients are prohibited from taking action on
the basis of information in this e-mail and using or disseminating the information, and must
notify the sender and delete it from their system. L&T Infotech will not accept responsibility
or liability for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message