kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Data inconsistency after restart
Date Thu, 04 Jan 2018 23:35:32 GMT
Hey Petter,

Did you ever get to the bottom of this? We definitely don't expect Kudu to
lose data on a restart (and we have hundreds of tests running continuously
which try to ensure this)

-Todd

On Fri, Dec 8, 2017 at 10:13 PM, David Alves <davidralves@gmail.com> wrote:

> Hi Petter
>
>  Don't have answers yet, but I do have some more questions  (inline)
>
> Petter von Dolwitz (Hem) writes:
>
> Hi David,
>>
>> In short to summarize:
>>
>> 1. I ingest data. Kudus maintenance threads stops working (soft memory
>> limit) and incoming data is throttled. There are no errors reported on the
>> client side.
>>
>  What is the "client side"? impala? spark? java/c++.
>
> 2. I stop ingestion and wait until i *think* Kudu is finsished.
>>
>  The question above is pertinent. Impala will not return until a  query
>  is fully successful, though it may return an error and leave a  query
>  only half-way executed. If you're using the client apis directly
>  though are you checking for error when inserting?
>
> 3. I restart Kudu.
>> 4. I validate the inserted data by doing count(*) on groups of data in
>> Kudu. For several groups, Kudu reports a lot of rows missing.
>>
>  Kudu's default scan mode is READ_LATEST. While this is the most
>  performance oriented mode, its also the one with the least  guarantees
>  so, on startup its possible that it reads from a stale replica,  giving
>  the _appearance_ that rows went missing. Things to try here:
>  - Try the same query a few minutes later. Is the answer  different?
>  - If the above is true consider changing your scan mode to
>  READ_AT_SNAPHOT. In this mode data is guaranteed not to be  state,
>  though you might have to wait for all replicas to be ready
>
> 5. I ingest the same data again. Client reports that all row are already
>> present.
>>
>  This isn't surprising _if_ the problem is indeed from state  replicas.
>
>> 6. Doing the count(*) exercise again now gives me the correct number of
>> rows.
>>
>> This tells me that the data was ingested into Kudu on the first attempt
>> but
>> a scan did not find the data. Inserting the data again made it
>> visible.
>>
>  Can it be that after the scan it's just that enough time has  elapsed
>  so that all replicas are caught up? I'd say this is likely the  case.
>
>>
>> Br,
>> Petter
>>
>> 2017-12-07 21:39 GMT+01:00 David Alves <davidralves@gmail.com>:
>>
>> Hi Petter
>>>
>>>    I'd like to clarify what exactly happened and exactly what    are you
>>> referring to as "inconsistency".
>>>    From what I understand of the first error you observed, the    Kudu
>>> was
>>> underprovisioned, memory wise, and the ingest jobs/queries failed. Is
>>> that
>>> right? Since Kudu doesn't have atomic multi-row writes, it's currently
>>> expected in this case that you'll end up with partially written data.
>>>    If you tried the same job again, and it succeeded, for    certain
>>> types of
>>> operation (UPSERT, INSERT IGNORE) then the remaining rows would be
>>> written
>>> and all the data would be there as expected.
>>>    I'd like to distinguish this lack of atomicity on multi-row
>>> transactions from "inconsistency", which is what you might observe if an
>>> operation didn't fail, but you couldn't see all the data. For this latter
>>> case there are options you can choose to avoid any inconsistency.
>>>
>>> Best
>>> David
>>>
>>>
>>>
>>> On Wed, Dec 6, 2017 at 4:26 AM, Petter von Dolwitz (Hem) <
>>> petter.von.dolwitz@gmail.com> wrote:
>>>
>>> Thanks for your reply Andrew!
>>>>
>>>> >How did you verify that all the data was inserted and how did >you
find
>>>> some data missing?
>>>> This was done using Impala. We counted the rows for groups representing
>>>> the chunks we inserted.
>>>>
>>>> >Following up on what I posted, take a look at
>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>> read_operations_scans. It seems definitely possible that not all of the
>>>> rows had finished inserting when counting, or that the scans were sent
>>>> to a
>>>> stale replica.
>>>> Before we shut down we could only see the following in the logs. I.e.,
>>>> no
>>>> sign that ingestion was still ongoing.
>>>>
>>>> kudu-tserver.ip-xx-yyy-z-nnn.root.log.INFO.20171201-065232.90314:I1201
>>>> 07:27:35.010694 90793 maintenance_manager.cc:383] P
>>>> a38902afefca4a85a5469d149df9b4cb: we have exceeded our soft memory
>>>> limit
>>>> (current capacity is 67.52%).  However, there are no ops currently
>>>> runnable
>>>> which would free memory.
>>>>
>>>> Also the (cloudera) metric total_kudu_rows_inserted_rate_
>>>> across_kudu_replicas
>>>> showed zero.
>>>>
>>>> Still it seems like some data became inconsistent after restart. But if
>>>> the maintenance_manager performs important jobs that are required to
>>>> ensure
>>>> that all data is inserted then I can understand why we ended up with
>>>> inconsistent data. But, if I understand you correct,  you are saying
>>>> that
>>>> these jobs are not critical for ingestion. In the link you provided I
>>>> read
>>>> "Impala scans are currently performed as READ_LATEST and have no
>>>> consistency guarantees.". I would assume this means that it does not
>>>> guarantee consistency if new data is inserted but should give valid (and
>>>> same) results if no new data is inserted?
>>>>
>>>> I have not tried the ksck tool yet. Thank you for reminding. I will have
>>>> a look.
>>>>
>>>> Br,
>>>> Petter
>>>>
>>>>
>>>> 2017-12-06 1:31 GMT+01:00 Andrew Wong <awong@cloudera.com>:
>>>>
>>>> How did you verify that all the data was inserted and how did you find
>>>>>
>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>> inserting
>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>
>>>>>>
>>>>> Following up on what I posted, take a look at
>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>> read_operations_scans. It seems definitely possible that not all of the
>>>>> rows had finished inserting when counting, or that the scans were sent
>>>>> to a
>>>>> stale replica.
>>>>>
>>>>> On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong <awong@cloudera.com>
>>>>> wrote:
>>>>>
>>>>> Hi Petter,
>>>>>>
>>>>>> When we verified that all data was inserted we found that some data
>>>>>> was
>>>>>>
>>>>>>> missing. We added this missing data and on some chunks we got
the
>>>>>>> information that all rows were already present, i.e impala says
>>>>>>> something
>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification
again
>>>>>>> now
>>>>>>> shows that the Kudu table is complete. So, even though we did
not
>>>>>>> insert
>>>>>>> any data on some chunks, a count(*) operation over these chunks
now
>>>>>>> returns
>>>>>>> a different value.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> How did you verify that all the data was inserted and how did you
find
>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>> inserting
>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>
>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
after
>>>>>>
>>>>>>> seeing soft memory limit warnings?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Your data should be consistently written, even with those warnings.
>>>>>> AFAIK they would cause a bit of slowness, not incorrect results.
>>>>>>
>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>
>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>> only
>>>>>>> restart the tablet servers, only restart one tablet server at
a time
>>>>>>> or
>>>>>>> something like that)?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> In general, you can use the `ksck` tool to check the health of your
>>>>>> cluster. See https://kudu.apache.org/docs/command_line_tools_referenc
>>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I
>>>>>> would recommend taking down all tablet servers at once, otherwise
>>>>>> tablet
>>>>>> replicas may try to replicate data from the server that was taken
>>>>>> down.
>>>>>>
>>>>>> Hope this helped,
>>>>>> Andrew
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
>>>>>> petter.von.dolwitz@gmail.com> wrote:
>>>>>>
>>>>>> Hi Kudu users,
>>>>>>>
>>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline
for
>>>>>>> evaluation we ingested 3 month worth of data. During ingestion
we
>>>>>>> were
>>>>>>> facing messages from the maintenance threads that a soft memory
>>>>>>> limit were
>>>>>>> reached. It seems like the background maintenance threads stopped
>>>>>>> performing their tasks at this point in time. It also so seems
like
>>>>>>> the
>>>>>>> memory was never recovered even after stopping ingestion so I
guess
>>>>>>> there
>>>>>>> was a large backlog being built up. I guess the root cause here
is
>>>>>>> that we
>>>>>>> were a bit too conservative when giving Kudu memory. After a
>>>>>>> reststart a
>>>>>>> lot of maintenance tasks were started (i.e. compaction).
>>>>>>>
>>>>>>> When we verified that all data was inserted we found that some
data
>>>>>>> was missing. We added this missing data and on some chunks we
got the
>>>>>>> information that all rows were already present, i.e impala says
>>>>>>> something
>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification
again
>>>>>>> now
>>>>>>> shows that the Kudu table is complete. So, even though we did
not
>>>>>>> insert
>>>>>>> any data on some chunks, a count(*) operation over these chunks
now
>>>>>>> returns
>>>>>>> a different value.
>>>>>>>
>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>> after
>>>>>>> seeing soft memory limit warnings?
>>>>>>>
>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid
these
>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>> only
>>>>>>> restart the tablet servers, only restart one tablet server at
a time
>>>>>>> or
>>>>>>> something like that)?
>>>>>>>
>>>>>>> The table design uses 50 tablets per day (times 90 days). It
is 8 TB
>>>>>>> of data after 3xreplication over 5 tablet servers.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Petter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrew Wong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrew Wong
>>>>>
>>>>>
>>>>
>>>>
>>>
>
> --
> David Alves
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message