kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Kim <bbuil...@gmail.com>
Subject Re: abnormal high disk I/O rate when upsert into kudu table?
Date Wed, 17 Aug 2016 01:09:24 GMT
This could be a problem… If this is a bad byproduct brought over from HBase, then this is
a common issue for all HBase users. It would be too bad if this also exists in Kudu. We HBase
users have been trying to eradicate this for a long time.

It’s only an opinion…

Cheers,
Ben


> On Aug 16, 2016, at 6:05 PM, jacky.he@gmail.com wrote:
> 
> Thanks Todd.
> 
> Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test table is
about 140GB after 3 reps,  and partitioned by hash bucket, I had tried 24 and 120 hash buckets.
> 
> I do one test: 
> 1. Stop all ingestion to the cluster
> 2. Just randomly upsert 3000 rows once, upsert contains new data row or just updates
to exisit row (updates the whole row, not just updates one or more column)
> 3. From the CDH monitor dashboard, I see the cluster's disk I/O raising from ~300Mb/s
to ~1.5Gb/s, and get back the ~300Mb/s 30min later or more
> 
> I check some of tablet node INFO log, they are always doing compaction, compacting 1~
100s of thousands rows.
> 
> My question:
> 1. Are the maintenance manager is rewriting the whole table?  3000 rows upsert once will
trigger a rewriting the whole table?
> 2. Does the background I/O have impacts to the scan performance.
> 3. About the number of hash partitioned buckets,  I partitioned the table to 24 or 120
buckets, what's the difference in upsert and scan performance? and what is the best practices?
> 4. What is the recommended setting for tablet server memory hard limit?
> 
> Thanks.
> 
> jacky.he@gmail.com <mailto:jacky.he@gmail.com>
>  
> From: Todd Lipcon <mailto:todd@cloudera.com>
> Date: 2016-08-17 01:58
> To: user <mailto:user@kudu.apache.org>
> Subject: Re: abnormal high disk I/O rate when upsert into kudu table?
> Hi Jacky,
> 
> Answers inline below
> 
> On Tue, Aug 16, 2016 at 8:13 AM, jacky.he@gmail.com <mailto:jacky.he@gmail.com>
<jacky.he@gmail.com <mailto:jacky.he@gmail.com>> wrote:
> Dear Kudu Developers, 
> 
> I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated master node
and 12 tablet node, 
> each node has 128GB memory, and 1 SSD for WAL, 6 1TB SAS for data
> 
> we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set 16GB memory
hard limit for each tablet node.
> 
> Sounds like a good cluster setup. Thanks for providing the details. 
> 
>  
> one of our test table is about 80-100 columns and 1 key column, with java client, we
can insert/upsert into the kudu table about 100,000/s
> the kudu table has 300m rows, and about 300,000 rows update per day, we also use java
client upsert API to update the rows
> 
> we found the kudu cluster maybe encounter abnormal high disk I/O rate, about 1.5-2.0Gb/s,
even we just update 1,000~10,000 rows/s
> i would like to know, with our row update frequency, is the cluster high disk rate normal
or not?
> 
> Are you upserts randomly spread across the range of rows in the table? If so, then when
the updates flush, they'll trigger compactions of the updates and inserted rows into the existing
data. This will cause, over time, a rewrite of the whole table, in order to incorporate the
updates.
> 
> This background I/O is run by the "maintenance manager". You can visit http://tablet-server:8050/maintenance-manager
<http://tablet-server:8050/maintenance-manager> to see a dashboard of currently running
maintenance operations such as compactions.
> 
> The maintenance manager runs a preset number of threads, so the amount of background
I/O you're experiencing won't increase if you increase the number of upserts.
> 
> I'm curious, is the background I/O causing an issue, or just unexpected?
> 
> Thanks
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Mime
View raw message