hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sat, 07 Mar 2015 00:46:59 GMT
hi, guys,

I am trying to get a rough idea about the performance comparison between
c++ and java client when access HBase table, and is surprised to find out
that Thrift (c++) is 4X slower

The performance result is:
C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s


I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use the
largest table : lineitem, which has 6M rows, roughly 600MB data.

For c++ client, I used the thrift example provided by hbase-examples, the
C++ code looks like:

>  std::string t("lineitem");
>  int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
>  int count = 0;
> ..
>  while (true) {
>    std::vector<TRowResult> value;
>    client.scannerGet(value, scanner);
>    if (value.size() == 0) break;
>    count ++;
>  }
>
>  std::cout << count << " rows scanned"<< std::endl;
>

For java client is the most simple one:

>     HTable table = new HTable(conf,"lineitem");
>
>     Scan scan = new Scan();
>     ResultScanner resScanner;
>     resScanner = table.getScanner(scan);
>     int count = 0;
>     for (Result res: resScanner) {
>       count ++;
>     }
>



Since most of the time should be on I/O, I don't expect any significant
difference between Thrift(C++) and Java. Any ideas? Many thanks

Demai

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message