hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Mon, 09 Mar 2015 21:26:10 GMT
Andrey and all,

thanks for the input. Andrey, if possible, do you mind share your code
segment so I can follow the setting on your side?

I have exactly the same thought when face the result first time. I was
expecting a little bit performance issue (10~20%) when using Thrift(C++),
and not as much.

Now I am looking into the C++ api call. Original, I used
"client.scannerGet(value, scanner)" ,which will do a lot of prepare
work(like flush) for each call. I just changed the code to use
"client.scannerGetList(value,scanner, 10000);". Sure enough, the
performance improved. However, for a similiar comparison, I did set java
client to 10000 batch/cache. Here is the new code:

> *C++*
>     TScan tscan;
>     int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
>     int count = 0;
>     try {
>       while (true) {
>         std::vector<TRowResult> value;
>
>         client.scannerGetList(value,scanner, *10000*);
>         if (value.size() == 0) {
>          break;
>         } else count+=value.size();
>       }
>

*Java *
    int total = 0;

        scan  = new Scan();

*        scan.setCaching(10000);        scan.setBatch(10000);*
        resScanner = table.getScanner(scan);
        int count = 0;
        for (Result res: resScanner) {
            count ++;
        }

so both client code improved as expected, and the Thrift C++ still take 3X
time comparing to Java:
C++ : real    6m46.845s, user    1m59.636s, sys    0m11.984s
Java: real    2m27.245s, user    0m17.624s, sys    0m4.779s

To be fair, I am able to setCaching on Java Client, but didn't find a way
to do the same through the C++ API, which also make some difference

Demai


On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev <octo47@gmail.com> wrote:

> Hi Demai.
>
> Thats seems odd for me, in my tests I got very similar performance.
> I'd like to suggest to check that scans have identical parameters
> (cache size in particular). That can bring very different performance
> in you case.
>
> Thanks.
>
> On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <mike@axiak.net> wrote:
>
> > If you're going the JNI route, the best bet is to embed a VM in your C
> > project. You use "java -s -p" to create the required header files and
> > compile linking against the java library.  This article talks about
> > how to talk from C to Java:
> >
> >
> http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI
> >
> > Best,
> > Mike
> >
> > On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel
> > <michael_segel@hotmail.com> wrote:
> > > JNI example?
> > >
> > > I don’t have one… my client’s own the code so I can’t take it with
me
> > and share.
> > > (The joys of being a consultant means you can’t take it with you and
> you
> > need to make sure you don’t xfer IP accidentally. )
> > >
> > >
> > > Maybe in one of the HBase books? Or just google for a JNI example on
> the
> > web since its straight forward Java code to connect to HBase and then
> > straight JNI t talk to C/C++
> > >
> > >
> > >> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com> wrote:
> > >>
> > >> Nick, thanks. I will give REST a try. However, if it use the same
> > design,
> > >> the result probably will be the same.
> > >>
> > >> Michael, I was thinking about the same thing through JNI. Is there an
> > >> example I can follow?
> > >>
> > >> Mike (Axiak), I run the C++ client on the same linux machine as the
> > hbase
> > >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It
> > doesn't
> > >> make a difference, does it?
> > >>
> > >> Anyway, considering Thrift will get the scan result from HBase first,
> > then
> > >> my c++ client the same data from Thrift. It definitely cost(probably)
> > >> double the time/cpu. So JNI may be the right way to go. Is there an
> > example
> > >> I can use? thanks
> > >>
> > >> Demai
> > >>
> > >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net> wrote:
> > >>
> > >>> What if you install the thrift server locally on every C++ client
> > >>> machine? I'd imagine performance should be similar to native java
> > >>> performance at that point.
> > >>>
> > >>> -Mike
> > >>>
> > >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <
> > michael_segel@hotmail.com>
> > >>> wrote:
> > >>>> Or you could try a java connection wrapped by JNI so you can call
it
> > >>> from your C++ app.
> > >>>>
> > >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com>
> wrote:
> > >>>>>
> > >>>>> You can try the REST gateway, though it has the same basic
> > architecture
> > >>> as
> > >>>>> the thrift gateway. May be the details work out in your favor
over
> > rest.
> > >>>>>
> > >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com>
wrote:
> > >>>>>
> > >>>>>> Stack,
> > >>>>>>
> > >>>>>> Thanks for the quick response. Well, the extra layer really
kill
> the
> > >>>>>> Performance. The 'hop' is so expensive
> > >>>>>>
> > >>>>>> Is there another C/C++ api to try out?  I saw there is
a jira
> > >>> Hbase-1015,
> > >>>>>> but was inactive for a while.
> > >>>>>>
> > >>>>>> Demai
> > >>>>>>
> > >>>>>> Stack <stack@duboce.net> wrote:
> > >>>>>>
> > >>>>>>> Is it because of the 'hop'?  Java goes against RS.
The thrift C++
> > >>> goes to
> > >>>>>> a
> > >>>>>>> thriftserver which hosts a java client and then it
goes to the
> RS?
> > >>>>>>> St.Ack
> > >>>>>>>
> > >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> hi, guys,
> > >>>>>>>>
> > >>>>>>>> I am trying to get a rough idea about the performance
comparison
> > >>> between
> > >>>>>>>> c++ and java client when access HBase table, and
is surprised to
> > find
> > >>>>>> out
> > >>>>>>>> that Thrift (c++) is 4X slower
> > >>>>>>>>
> > >>>>>>>> The performance result is:
> > >>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys
   2m21.388s
> > >>>>>>>> Java: real    *4m6.012s*;user    0m31.228s; sys
   0m8.018s
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> I have a single node HBase(98.6) cluster, with
1X TPCH loaded,
> and
> > >>> use
> > >>>>>> the
> > >>>>>>>> largest table : lineitem, which has 6M rows, roughly
600MB data.
> > >>>>>>>>
> > >>>>>>>> For c++ client, I used the thrift example provided
by
> > hbase-examples,
> > >>>>>> the
> > >>>>>>>> C++ code looks like:
> > >>>>>>>>
> > >>>>>>>>> std::string t("lineitem");
> > >>>>>>>>> int scanner =  client.scannerOpenWithScan(t,
tscan,
> > >>> dummyAttributes);
> > >>>>>>>>> int count = 0;
> > >>>>>>>>> ..
> > >>>>>>>>> while (true) {
> > >>>>>>>>>  std::vector<TRowResult> value;
> > >>>>>>>>>  client.scannerGet(value, scanner);
> > >>>>>>>>>  if (value.size() == 0) break;
> > >>>>>>>>>  count ++;
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> std::cout << count << " rows scanned"<<
std::endl;
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> For java client is the most simple one:
> > >>>>>>>>
> > >>>>>>>>>   HTable table = new HTable(conf,"lineitem");
> > >>>>>>>>>
> > >>>>>>>>>   Scan scan = new Scan();
> > >>>>>>>>>   ResultScanner resScanner;
> > >>>>>>>>>   resScanner = table.getScanner(scan);
> > >>>>>>>>>   int count = 0;
> > >>>>>>>>>   for (Result res: resScanner) {
> > >>>>>>>>>     count ++;
> > >>>>>>>>>   }
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Since most of the time should be on I/O, I don't
expect any
> > >>> significant
> > >>>>>>>> difference between Thrift(C++) and Java. Any ideas?
Many thanks
> > >>>>>>>>
> > >>>>>>>> Demai
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>> The opinions expressed here are mine, while they may reflect a
> > cognitive
> > >>> thought, that is purely accidental.
> > >>>> Use at your own risk.
> > >>>> Michael Segel
> > >>>> michael_segel (AT) hotmail.com
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Andrey.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message