hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sun, 08 Mar 2015 20:40:03 GMT
Hi Demai.

Thats seems odd for me, in my tests I got very similar performance.
I'd like to suggest to check that scans have identical parameters
(cache size in particular). That can bring very different performance
in you case.

Thanks.

On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <mike@axiak.net> wrote:

> If you're going the JNI route, the best bet is to embed a VM in your C
> project. You use "java -s -p" to create the required header files and
> compile linking against the java library.  This article talks about
> how to talk from C to Java:
>
> http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI
>
> Best,
> Mike
>
> On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel
> <michael_segel@hotmail.com> wrote:
> > JNI example?
> >
> > I don’t have one… my client’s own the code so I can’t take it with me
> and share.
> > (The joys of being a consultant means you can’t take it with you and you
> need to make sure you don’t xfer IP accidentally. )
> >
> >
> > Maybe in one of the HBase books? Or just google for a JNI example on the
> web since its straight forward Java code to connect to HBase and then
> straight JNI t talk to C/C++
> >
> >
> >> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com> wrote:
> >>
> >> Nick, thanks. I will give REST a try. However, if it use the same
> design,
> >> the result probably will be the same.
> >>
> >> Michael, I was thinking about the same thing through JNI. Is there an
> >> example I can follow?
> >>
> >> Mike (Axiak), I run the C++ client on the same linux machine as the
> hbase
> >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It
> doesn't
> >> make a difference, does it?
> >>
> >> Anyway, considering Thrift will get the scan result from HBase first,
> then
> >> my c++ client the same data from Thrift. It definitely cost(probably)
> >> double the time/cpu. So JNI may be the right way to go. Is there an
> example
> >> I can use? thanks
> >>
> >> Demai
> >>
> >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net> wrote:
> >>
> >>> What if you install the thrift server locally on every C++ client
> >>> machine? I'd imagine performance should be similar to native java
> >>> performance at that point.
> >>>
> >>> -Mike
> >>>
> >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <
> michael_segel@hotmail.com>
> >>> wrote:
> >>>> Or you could try a java connection wrapped by JNI so you can call it
> >>> from your C++ app.
> >>>>
> >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com>
wrote:
> >>>>>
> >>>>> You can try the REST gateway, though it has the same basic
> architecture
> >>> as
> >>>>> the thrift gateway. May be the details work out in your favor over
> rest.
> >>>>>
> >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com>
wrote:
> >>>>>
> >>>>>> Stack,
> >>>>>>
> >>>>>> Thanks for the quick response. Well, the extra layer really
kill the
> >>>>>> Performance. The 'hop' is so expensive
> >>>>>>
> >>>>>> Is there another C/C++ api to try out?  I saw there is a jira
> >>> Hbase-1015,
> >>>>>> but was inactive for a while.
> >>>>>>
> >>>>>> Demai
> >>>>>>
> >>>>>> Stack <stack@duboce.net> wrote:
> >>>>>>
> >>>>>>> Is it because of the 'hop'?  Java goes against RS. The thrift
C++
> >>> goes to
> >>>>>> a
> >>>>>>> thriftserver which hosts a java client and then it goes
to the RS?
> >>>>>>> St.Ack
> >>>>>>>
> >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com>
wrote:
> >>>>>>>
> >>>>>>>> hi, guys,
> >>>>>>>>
> >>>>>>>> I am trying to get a rough idea about the performance
comparison
> >>> between
> >>>>>>>> c++ and java client when access HBase table, and is
surprised to
> find
> >>>>>> out
> >>>>>>>> that Thrift (c++) is 4X slower
> >>>>>>>>
> >>>>>>>> The performance result is:
> >>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys  
 2m21.388s
> >>>>>>>> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have a single node HBase(98.6) cluster, with 1X TPCH
loaded, and
> >>> use
> >>>>>> the
> >>>>>>>> largest table : lineitem, which has 6M rows, roughly
600MB data.
> >>>>>>>>
> >>>>>>>> For c++ client, I used the thrift example provided by
> hbase-examples,
> >>>>>> the
> >>>>>>>> C++ code looks like:
> >>>>>>>>
> >>>>>>>>> std::string t("lineitem");
> >>>>>>>>> int scanner =  client.scannerOpenWithScan(t, tscan,
> >>> dummyAttributes);
> >>>>>>>>> int count = 0;
> >>>>>>>>> ..
> >>>>>>>>> while (true) {
> >>>>>>>>>  std::vector<TRowResult> value;
> >>>>>>>>>  client.scannerGet(value, scanner);
> >>>>>>>>>  if (value.size() == 0) break;
> >>>>>>>>>  count ++;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> std::cout << count << " rows scanned"<<
std::endl;
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> For java client is the most simple one:
> >>>>>>>>
> >>>>>>>>>   HTable table = new HTable(conf,"lineitem");
> >>>>>>>>>
> >>>>>>>>>   Scan scan = new Scan();
> >>>>>>>>>   ResultScanner resScanner;
> >>>>>>>>>   resScanner = table.getScanner(scan);
> >>>>>>>>>   int count = 0;
> >>>>>>>>>   for (Result res: resScanner) {
> >>>>>>>>>     count ++;
> >>>>>>>>>   }
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Since most of the time should be on I/O, I don't expect
any
> >>> significant
> >>>>>>>> difference between Thrift(C++) and Java. Any ideas?
Many thanks
> >>>>>>>>
> >>>>>>>> Demai
> >>>>>>>>
> >>>>>>
> >>>>
> >>>> The opinions expressed here are mine, while they may reflect a
> cognitive
> >>> thought, that is purely accidental.
> >>>> Use at your own risk.
> >>>> Michael Segel
> >>>> michael_segel (AT) hotmail.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
>



-- 
Andrey.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message