kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Alves <davidral...@gmail.com>
Subject Re: tserver died during bulk indexing and dies again after restarting
Date Mon, 24 Apr 2017 17:39:39 GMT
Hi Jason

  No problem. Sorry if I misunderstood your previous email.
  If you could share the log files themselves that would be great, if not,
that's ok too.
  You could use the kudu tool to delete the local replica for that tablet
(without a running tserver daemon), but its likely that it's been gone a
while and kicked out of most if not all consensus config, at which point,
if all you data is available you could just delete the data and re-add it
to the cluster.

Best
David


On Mon, Apr 24, 2017 at 4:33 AM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Hi David.
>
> Thank you for your kind reply.
>
> I understood but I'm afraid I can't provide my WAL because it has
> sensitive data, even via your private email.
>
> Regards,
>
> Jason
>
> 2017-04-24 15:12 GMT+09:00 David Alves <davidralves@gmail.com>:
>
>> Hi Jason
>>
>>   I meant the last wal segment for the 30aaccdf7c8c496a8ad73255856a1724
>> tablet on the dead server (if you don't have sensitive data in there)
>>   Not sure whether you specified the flag: "--fs_wal_dir". If so it
>> should be in there, if not the wals are in the same dir as the value set
>> for "--fs_data_dirs".
>>   A wal file has a name like: "wal-000000001"
>>
>> Best
>> David
>>
>>
>> On Sat, Apr 22, 2017 at 7:46 PM, Jason Heo <jason.heo.sde@gmail.com>
>> wrote:
>>
>>> Hi David.
>>>
>>> Sorry for the insufficient information.
>>>
>>> There are 14 nodes in my test kudu cluster. Only one tserver has been
>>> dead. It has only above two logs.
>>>
>>> Other 13 nodes has "Error trying to read ahead of the log while
>>> preparing peer request: Incomplete: Op with" error 7~10 times.
>>>
>>> >> *Would it be possible to also get the WAL with the corrupted entry?*
>>>
>>> Would you please explain how to get it in more detail?
>>>
>>> I tried what I did again and again to reproduce same error, but it
>>> didn't happen again.
>>>
>>> Please feel free to ask me for anything what you need to resolve.
>>>
>>> Regards,
>>>
>>> Jason
>>>
>>> 2017-04-23 1:56 GMT+09:00 <davidralves@gmail.com>:
>>>
>>>> Hi Jason
>>>>
>>>>   Anything else of interest in those logs? Can you share them (with
>>>> just me, if you prefer)? Would it be possible to also get the WAL with
>>>> the corrupted entry?
>>>>   Did this happen on a single server?
>>>>
>>>> Best
>>>> David
>>>>
>>>
>>>
>>
>

Mime
View raw message