hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Axiak <m...@axiak.net>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sun, 08 Mar 2015 18:50:26 GMT
If you're going the JNI route, the best bet is to embed a VM in your C
project. You use "java -s -p" to create the required header files and
compile linking against the java library.  This article talks about
how to talk from C to Java:
http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI

Best,
Mike

On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel
<michael_segel@hotmail.com> wrote:
> JNI example?
>
> I don’t have one… my client’s own the code so I can’t take it with me and share.
> (The joys of being a consultant means you can’t take it with you and you need to make
sure you don’t xfer IP accidentally. )
>
>
> Maybe in one of the HBase books? Or just google for a JNI example on the web since its
straight forward Java code to connect to HBase and then straight JNI t talk to C/C++
>
>
>> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com> wrote:
>>
>> Nick, thanks. I will give REST a try. However, if it use the same design,
>> the result probably will be the same.
>>
>> Michael, I was thinking about the same thing through JNI. Is there an
>> example I can follow?
>>
>> Mike (Axiak), I run the C++ client on the same linux machine as the hbase
>> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It doesn't
>> make a difference, does it?
>>
>> Anyway, considering Thrift will get the scan result from HBase first, then
>> my c++ client the same data from Thrift. It definitely cost(probably)
>> double the time/cpu. So JNI may be the right way to go. Is there an example
>> I can use? thanks
>>
>> Demai
>>
>> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net> wrote:
>>
>>> What if you install the thrift server locally on every C++ client
>>> machine? I'd imagine performance should be similar to native java
>>> performance at that point.
>>>
>>> -Mike
>>>
>>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <michael_segel@hotmail.com>
>>> wrote:
>>>> Or you could try a java connection wrapped by JNI so you can call it
>>> from your C++ app.
>>>>
>>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>>>>>
>>>>> You can try the REST gateway, though it has the same basic architecture
>>> as
>>>>> the thrift gateway. May be the details work out in your favor over rest.
>>>>>
>>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com> wrote:
>>>>>
>>>>>> Stack,
>>>>>>
>>>>>> Thanks for the quick response. Well, the extra layer really kill
the
>>>>>> Performance. The 'hop' is so expensive
>>>>>>
>>>>>> Is there another C/C++ api to try out?  I saw there is a jira
>>> Hbase-1015,
>>>>>> but was inactive for a while.
>>>>>>
>>>>>> Demai
>>>>>>
>>>>>> Stack <stack@duboce.net> wrote:
>>>>>>
>>>>>>> Is it because of the 'hop'?  Java goes against RS. The thrift
C++
>>> goes to
>>>>>> a
>>>>>>> thriftserver which hosts a java client and then it goes to the
RS?
>>>>>>> St.Ack
>>>>>>>
>>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com>
wrote:
>>>>>>>
>>>>>>>> hi, guys,
>>>>>>>>
>>>>>>>> I am trying to get a rough idea about the performance comparison
>>> between
>>>>>>>> c++ and java client when access HBase table, and is surprised
to find
>>>>>> out
>>>>>>>> that Thrift (c++) is 4X slower
>>>>>>>>
>>>>>>>> The performance result is:
>>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
>>>>>>>> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
>>>>>>>>
>>>>>>>>
>>>>>>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded,
and
>>> use
>>>>>> the
>>>>>>>> largest table : lineitem, which has 6M rows, roughly 600MB
data.
>>>>>>>>
>>>>>>>> For c++ client, I used the thrift example provided by hbase-examples,
>>>>>> the
>>>>>>>> C++ code looks like:
>>>>>>>>
>>>>>>>>> std::string t("lineitem");
>>>>>>>>> int scanner =  client.scannerOpenWithScan(t, tscan,
>>> dummyAttributes);
>>>>>>>>> int count = 0;
>>>>>>>>> ..
>>>>>>>>> while (true) {
>>>>>>>>>  std::vector<TRowResult> value;
>>>>>>>>>  client.scannerGet(value, scanner);
>>>>>>>>>  if (value.size() == 0) break;
>>>>>>>>>  count ++;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> std::cout << count << " rows scanned"<<
std::endl;
>>>>>>>>>
>>>>>>>>
>>>>>>>> For java client is the most simple one:
>>>>>>>>
>>>>>>>>>   HTable table = new HTable(conf,"lineitem");
>>>>>>>>>
>>>>>>>>>   Scan scan = new Scan();
>>>>>>>>>   ResultScanner resScanner;
>>>>>>>>>   resScanner = table.getScanner(scan);
>>>>>>>>>   int count = 0;
>>>>>>>>>   for (Result res: resScanner) {
>>>>>>>>>     count ++;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Since most of the time should be on I/O, I don't expect any
>>> significant
>>>>>>>> difference between Thrift(C++) and Java. Any ideas? Many
thanks
>>>>>>>>
>>>>>>>> Demai
>>>>>>>>
>>>>>>
>>>>
>>>> The opinions expressed here are mine, while they may reflect a cognitive
>>> thought, that is purely accidental.
>>>> Use at your own risk.
>>>> Michael Segel
>>>> michael_segel (AT) hotmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
> The opinions expressed here are mine, while they may reflect a cognitive thought, that
is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>

Mime
View raw message