hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Tue, 10 Mar 2015 18:51:44 GMT
Andrey,

thanks. You are right that I am using Thrift v1. I was following example
under : hbase-examples/src/main/cpp/DemoClient.cpp. It looks like pretty
old, and actually its scan example:

> scanner = client.scannerOpenWithStop(t, "00020", "00040", columnNames,
> dummyAttributes);
>
doesn't work.

I googled a bit, and it looks like HBase recommend Thrift2 now?

Demai


On Mon, Mar 9, 2015 at 3:41 PM, Andrey Stepachev <octo47@gmail.com> wrote:

> Sorry Demai, I have no access to that code currently.
>
> But what you described seems that you use
> thrift v1. I'd recommend to use thrift2.
>
> Also it is a good idea to check thrift server configuration:
> 1. blocking/nonblocking/hsha, and framed or not
> 2. size of thread pool
>
>
>
> On Mon, Mar 9, 2015 at 9:26 PM, Demai Ni <nidmgg@gmail.com> wrote:
>
> > Andrey and all,
> >
> > thanks for the input. Andrey, if possible, do you mind share your code
> > segment so I can follow the setting on your side?
> >
> > I have exactly the same thought when face the result first time. I was
> > expecting a little bit performance issue (10~20%) when using Thrift(C++),
> > and not as much.
> >
> > Now I am looking into the C++ api call. Original, I used
> > "client.scannerGet(value, scanner)" ,which will do a lot of prepare
> > work(like flush) for each call. I just changed the code to use
> > "client.scannerGetList(value,scanner, 10000);". Sure enough, the
> > performance improved. However, for a similiar comparison, I did set java
> > client to 10000 batch/cache. Here is the new code:
> >
> > > *C++*
> > >     TScan tscan;
> > >     int scanner =  client.scannerOpenWithScan(t, tscan,
> dummyAttributes);
> > >     int count = 0;
> > >     try {
> > >       while (true) {
> > >         std::vector<TRowResult> value;
> > >
> > >         client.scannerGetList(value,scanner, *10000*);
> > >         if (value.size() == 0) {
> > >          break;
> > >         } else count+=value.size();
> > >       }
> > >
> >
> > *Java *
> >     int total = 0;
> >
> >         scan  = new Scan();
> >
> > *        scan.setCaching(10000);        scan.setBatch(10000);*
> >         resScanner = table.getScanner(scan);
> >         int count = 0;
> >         for (Result res: resScanner) {
> >             count ++;
> >         }
> >
> > so both client code improved as expected, and the Thrift C++ still take
> 3X
> > time comparing to Java:
> > C++ : real    6m46.845s, user    1m59.636s, sys    0m11.984s
> > Java: real    2m27.245s, user    0m17.624s, sys    0m4.779s
> >
> > To be fair, I am able to setCaching on Java Client, but didn't find a way
> > to do the same through the C++ API, which also make some difference
> >
> > Demai
> >
> >
> > On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev <octo47@gmail.com>
> wrote:
> >
> > > Hi Demai.
> > >
> > > Thats seems odd for me, in my tests I got very similar performance.
> > > I'd like to suggest to check that scans have identical parameters
> > > (cache size in particular). That can bring very different performance
> > > in you case.
> > >
> > > Thanks.
> > >
> > > On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <mike@axiak.net> wrote:
> > >
> > > > If you're going the JNI route, the best bet is to embed a VM in your
> C
> > > > project. You use "java -s -p" to create the required header files and
> > > > compile linking against the java library.  This article talks about
> > > > how to talk from C to Java:
> > > >
> > > >
> > >
> >
> http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI
> > > >
> > > > Best,
> > > > Mike
> > > >
> > > > On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel
> > > > <michael_segel@hotmail.com> wrote:
> > > > > JNI example?
> > > > >
> > > > > I don’t have one… my client’s own the code so I can’t take
it with
> me
> > > > and share.
> > > > > (The joys of being a consultant means you can’t take it with you
> and
> > > you
> > > > need to make sure you don’t xfer IP accidentally. )
> > > > >
> > > > >
> > > > > Maybe in one of the HBase books? Or just google for a JNI example
> on
> > > the
> > > > web since its straight forward Java code to connect to HBase and then
> > > > straight JNI t talk to C/C++
> > > > >
> > > > >
> > > > >> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com>
wrote:
> > > > >>
> > > > >> Nick, thanks. I will give REST a try. However, if it use the
same
> > > > design,
> > > > >> the result probably will be the same.
> > > > >>
> > > > >> Michael, I was thinking about the same thing through JNI. Is
there
> > an
> > > > >> example I can follow?
> > > > >>
> > > > >> Mike (Axiak), I run the C++ client on the same linux machine
as
> the
> > > > hbase
> > > > >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0.
> It
> > > > doesn't
> > > > >> make a difference, does it?
> > > > >>
> > > > >> Anyway, considering Thrift will get the scan result from HBase
> > first,
> > > > then
> > > > >> my c++ client the same data from Thrift. It definitely
> > cost(probably)
> > > > >> double the time/cpu. So JNI may be the right way to go. Is there
> an
> > > > example
> > > > >> I can use? thanks
> > > > >>
> > > > >> Demai
> > > > >>
> > > > >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net>
> wrote:
> > > > >>
> > > > >>> What if you install the thrift server locally on every C++
client
> > > > >>> machine? I'd imagine performance should be similar to native
java
> > > > >>> performance at that point.
> > > > >>>
> > > > >>> -Mike
> > > > >>>
> > > > >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <
> > > > michael_segel@hotmail.com>
> > > > >>> wrote:
> > > > >>>> Or you could try a java connection wrapped by JNI so
you can
> call
> > it
> > > > >>> from your C++ app.
> > > > >>>>
> > > > >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com>
> > > wrote:
> > > > >>>>>
> > > > >>>>> You can try the REST gateway, though it has the same
basic
> > > > architecture
> > > > >>> as
> > > > >>>>> the thrift gateway. May be the details work out in
your favor
> > over
> > > > rest.
> > > > >>>>>
> > > > >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com>
> > wrote:
> > > > >>>>>
> > > > >>>>>> Stack,
> > > > >>>>>>
> > > > >>>>>> Thanks for the quick response. Well, the extra
layer really
> kill
> > > the
> > > > >>>>>> Performance. The 'hop' is so expensive
> > > > >>>>>>
> > > > >>>>>> Is there another C/C++ api to try out?  I saw
there is a jira
> > > > >>> Hbase-1015,
> > > > >>>>>> but was inactive for a while.
> > > > >>>>>>
> > > > >>>>>> Demai
> > > > >>>>>>
> > > > >>>>>> Stack <stack@duboce.net> wrote:
> > > > >>>>>>
> > > > >>>>>>> Is it because of the 'hop'?  Java goes against
RS. The thrift
> > C++
> > > > >>> goes to
> > > > >>>>>> a
> > > > >>>>>>> thriftserver which hosts a java client and
then it goes to
> the
> > > RS?
> > > > >>>>>>> St.Ack
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni
<nidmgg@gmail.com>
> > > wrote:
> > > > >>>>>>>
> > > > >>>>>>>> hi, guys,
> > > > >>>>>>>>
> > > > >>>>>>>> I am trying to get a rough idea about
the performance
> > comparison
> > > > >>> between
> > > > >>>>>>>> c++ and java client when access HBase
table, and is
> surprised
> > to
> > > > find
> > > > >>>>>> out
> > > > >>>>>>>> that Thrift (c++) is 4X slower
> > > > >>>>>>>>
> > > > >>>>>>>> The performance result is:
> > > > >>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s;
sys
> 2m21.388s
> > > > >>>>>>>> Java: real    *4m6.012s*;user    0m31.228s;
sys    0m8.018s
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> I have a single node HBase(98.6) cluster,
with 1X TPCH
> loaded,
> > > and
> > > > >>> use
> > > > >>>>>> the
> > > > >>>>>>>> largest table : lineitem, which has 6M
rows, roughly 600MB
> > data.
> > > > >>>>>>>>
> > > > >>>>>>>> For c++ client, I used the thrift example
provided by
> > > > hbase-examples,
> > > > >>>>>> the
> > > > >>>>>>>> C++ code looks like:
> > > > >>>>>>>>
> > > > >>>>>>>>> std::string t("lineitem");
> > > > >>>>>>>>> int scanner =  client.scannerOpenWithScan(t,
tscan,
> > > > >>> dummyAttributes);
> > > > >>>>>>>>> int count = 0;
> > > > >>>>>>>>> ..
> > > > >>>>>>>>> while (true) {
> > > > >>>>>>>>>  std::vector<TRowResult> value;
> > > > >>>>>>>>>  client.scannerGet(value, scanner);
> > > > >>>>>>>>>  if (value.size() == 0) break;
> > > > >>>>>>>>>  count ++;
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> std::cout << count <<
" rows scanned"<< std::endl;
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> For java client is the most simple one:
> > > > >>>>>>>>
> > > > >>>>>>>>>   HTable table = new HTable(conf,"lineitem");
> > > > >>>>>>>>>
> > > > >>>>>>>>>   Scan scan = new Scan();
> > > > >>>>>>>>>   ResultScanner resScanner;
> > > > >>>>>>>>>   resScanner = table.getScanner(scan);
> > > > >>>>>>>>>   int count = 0;
> > > > >>>>>>>>>   for (Result res: resScanner) {
> > > > >>>>>>>>>     count ++;
> > > > >>>>>>>>>   }
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Since most of the time should be on I/O,
I don't expect any
> > > > >>> significant
> > > > >>>>>>>> difference between Thrift(C++) and Java.
Any ideas? Many
> > thanks
> > > > >>>>>>>>
> > > > >>>>>>>> Demai
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>> The opinions expressed here are mine, while they may
reflect a
> > > > cognitive
> > > > >>> thought, that is purely accidental.
> > > > >>>> Use at your own risk.
> > > > >>>> Michael Segel
> > > > >>>> michael_segel (AT) hotmail.com
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >
> > > > > The opinions expressed here are mine, while they may reflect a
> > > cognitive
> > > > thought, that is purely accidental.
> > > > > Use at your own risk.
> > > > > Michael Segel
> > > > > michael_segel (AT) hotmail.com
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Andrey.
> > >
> >
>
>
>
> --
> Andrey.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message