kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op
Date Wed, 18 May 2016 22:38:44 GMT
Hm, so each of the strings is about 27 bytes, so each row is 27KB.  So, a
batch size of 500 is still >13MB. I'd start with something very low like
10, and work your way up. That said, this is definitely not in the
"standard" use cases for which Kudu has been designed.

I'd also recommend using compression and/or dictionary coding for a table
if you have many repeat values. Unfortunately, it's not currently do this
when creating a table using Impala.

-Todd

On Wed, May 18, 2016 at 10:51 AM, Abhi Basu <9000revs@gmail.com> wrote:

> Query: describe kudu_db.chr22_kudu
> +-------------+--------+---------+
> | name        | type   | comment |
> +-------------+--------+---------+
> | pos         | int    |         |
> | id          | string |         |
> | chrom       | string |         |
> | ref         | string |         |
> | alt         | string |         |
> | qual        | string |         |
> | filter      | string |         |
> | info        | string |         |
> | format_type | string |         |
> | hg00096     | string |         |
> | hg00097     | string |         |
> | hg00099     | string |         |
> | hg00100     | string |         |
> | hg00101     | string |         |
> | hg00102     | string |         |
> | hg00103     | string |         |
> | hg00104     | string |         |
>
> ..........
>
> all the way to column na20828 string.
>
> Each hg and na columns have values like:
> | hg00096                    |
> +----------------------------+
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
>
>
>
> On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> What are the types of your 1000 columns? Maybe an even smaller batch size
>> is necessary.
>>
>> -Todd
>>
>> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <9000revs@gmail.com> wrote:
>>
>>> I have tried with batch_size=500 and still get same error. For your
>>> reference are attached info that may help diagnose.
>>>
>>> Error: Error while applying Kudu session.: Incomplete: not enough space
>>> remaining in buffer for op (required 46.7K, 7.00M already used
>>>
>>>
>>> Config settings:
>>>
>>> Kudu Tablet Server Block Cache Capacity   1 GB
>>> Kudu Tablet Server Hard Memory Limit  16 GB
>>>
>>>
>>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>>> wdberkeley@cloudera.com> wrote:
>>>
>>>> Both options are more or less the same idea- the point is you need less
>>>> rows going in per batch so you don't go over the batch size limit. Follow
>>>> what Todd said as he explained it more clearly and suggested a better way.
>>>>
>>>> -Will
>>>>
>>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000revs@gmail.com> wrote:
>>>>
>>>>> Thanks for the updates. I will give both options a try and report back.
>>>>>
>>>>> If you are interested in testing with such datasets, I can help.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Abhi
>>>>>
>>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <todd@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Abhi,
>>>>>>
>>>>>> Will is right that the error is client-side, and probably happening
>>>>>> because your rows are so wide.Impala typically will batch 1000 rows
at a
>>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB,
that will
>>>>>> overflow the max buffer size that Will mentioned. This seems quite
probable
>>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes
each).
>>>>>>
>>>>>> I don't think his suggested workaround will help, but you can try
>>>>>> running 'set batch_size=500' before running the create table or insert
>>>>>> query.
>>>>>>
>>>>>> In terms of max supported columns, most of the workloads we are
>>>>>> focusing on are more like typical data-warehouse tables, on the order
of a
>>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>>> territory" where it's much more likely you'll hit problems like this
and
>>>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>>>> though you should probably be prepared for some rough edges.
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>>> wdberkeley@cloudera.com> wrote:
>>>>>>
>>>>>>> Hi Abhi.
>>>>>>>
>>>>>>> I believe that error is actually coming from the client, not
the
>>>>>>> server. See e,g,
>>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
(NB
>>>>>>> that link is to master branch not the exact release you are using).
>>>>>>>
>>>>>>> If you look around there, you'll see that the max is set by
>>>>>>> something called max_buffer_size_, which appears to be hardcoded
to 7 *
>>>>>>> 1024 * 1024 bytes = 7MiB (and this is consistent with 6.96 +
0.0467 > 7).
>>>>>>>
>>>>>>> I think the simple workaround would be to do the CTAS as a CTAS
+
>>>>>>> insert as select. Pick a condition that bipartitions the table,
so you
>>>>>>> don't get errors trying to double insert rows.
>>>>>>>
>>>>>>> -Will
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000revs@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What is the limit of columns in Kudu?
>>>>>>>>
>>>>>>>> I am using 1000 gen dataset, specifically the chr22 table
which has
>>>>>>>> 500,000 rows x 1101 columns. This table has been built In
Impala/HDFS. I am
>>>>>>>> trying to create a new Kudu table as select from that table.
I get the
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> Error while applying Kudu session.: Incomplete: not enough
space
>>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already
used
>>>>>>>>
>>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers,
I
>>>>>>>> see the following. What configuration needs to be tweaked?
>>>>>>>>
>>>>>>>>
>>>>>>>> Memory usage by subsystem
>>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>>> server root none 2.3K 201.4K
>>>>>>>> tablet-00000000000000000000000000000000 server none 530B
200.1K
>>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none
265B 265B
>>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M
0B 28.5K
>>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none
265B
>>>>>>>> 87.8K
>>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --
>>>>>>>> Abhi Basu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhi Basu
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Abhi Basu
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Abhi Basu
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message