kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Martynov <mr.xk...@gmail.com>
Subject Re: Bad insert performance of java kudu-client
Date Tue, 25 Apr 2017 10:09:37 GMT
I figure out that problem was that I run this program on my development
Windows machine. It seems that there is some performance issue with
java.net.NetworkInterface.getByInetAddress on Windows (I found only that
http://stackoverflow.com/questions/35541870/java-networkinterface-getbyinetaddress-takes-way-too-long
confirmation so far). See profiler screenshot
http://pasteboard.co/8uHil3I5H.png (kudu-client v1.3.1), every call take 53
ms (!) on average.
Also, could you recheck logic, why this function recalls 88 times in 12
seconds for that small program?

2017-04-24 22:29 GMT+03:00 Todd Lipcon <todd@cloudera.com>:

> I tried to reproduce this locally using your code and couldn't. I get
> around 100K inserts/second for 1.0, 1.1, 1.2, and 1.3 clients (against a
> 1.4-SNAPSHOT cluster)
>
> Is it always reproducible for you? eg if you switch back to the earlier
> client and try another set of runs, do you get the same results?
>
> -Todd
>
> On Mon, Apr 24, 2017 at 10:56 AM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> I vaguely recall some bug in earlier versions of the Java client where
>> 'shutdown' wouldn't properly block on the data being flushed. So it's
>> possible in 1.0.x and below, you're not actually measuring the full amount
>> of time to write all the data, whereas when the bug is fixed, you are.
>>
>> I'll see if I can repro this locally as well using your code.
>>
>> -Todd
>>
>> On Mon, Apr 24, 2017 at 10:49 AM, David Alves <davidralves@gmail.com>
>> wrote:
>>
>>> Hi Pavel
>>>
>>>   Interesting, Thanks for sharing those numbers.
>>>   I assume you weren't using AUTOFLUSH_BACKGROUND for the first versions
>>> you tested (don't think it was available then iirc).
>>>   Could you try without in the last version and see how the numbers
>>> compare?
>>>   We'd be happy to help track down the reason for this perf regression.
>>>
>>> Best
>>> David
>>>
>>> On Mon, Apr 24, 2017 at 4:58 AM, Pavel Martynov <mr.xkurt@gmail.com>
>>> wrote:
>>>
>>>> Hi, I ran into the fact that I can not achieve high insertion speed and
>>>> I start to experiment with https://github.com/cloude
>>>> ra/kudu-examples/tree/master/java/insert-loadgen.
>>>> My slightly modified code (recreation of table on startup + duration
>>>> measuring): https://gist.github.com/xkrt/9405a2eeb98a56288b7
>>>> c5a7d817097b4.
>>>> On every run I change kudu-client version, results:
>>>>
>>>> kudu-client-ver  perf
>>>> 0.10             Duration: 626 ms, 79872/sec
>>>> 1.0.0            Duration: 622 ms, 80385 inserts/sec
>>>> 1.0.1            Duration: 630 ms, 79365 inserts/sec
>>>> 1.1.0            Duration: 11703 ms, 4272 inserts/sec
>>>> 1.3.1            Duration: 12317 ms, 4059 inserts/sec
>>>>
>>>> As can you see there was a great degradation between 1.0.1 and 1.1.0
>>>> (about a ~20 times!).
>>>> What could be a problem, how can I fix it? (actually I interested in
>>>> kudu-spark, so probably using of kudu-client 1.0.1 is not right solution?).
>>>>
>>>> My test cluster: 3 hosts with master and tserver on each (3 masters and
>>>> 3 tservers overall).
>>>> No extra settings, flags used:
>>>> fs_wal_dir
>>>> fs_data_dirs
>>>> master_addresses
>>>> tserver_master_addrs
>>>>
>>>>
>>>> --
>>>> with best regards, Pavel Martynov
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
with best regards, Pavel Martynov

Mime
View raw message