kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Kafka Streams stopped with errors, failed to reinitialize itself
Date Mon, 08 May 2017 18:46:05 GMT
Hi Sameer,

I looked at the logs, and there is only one suspicious entry:

```
2017-05-03 14:26:54 WARN  StreamThread:1184 - Could not create task 0_21.
Will retry.
org.apache.kafka.streams.errors.LockException: task [0_21] Failed to lock
the state directory: /data/streampoc/LIC2-4/0_21
```

It replies three times and then did not show up, but I cannot tell for sure
since it is towards the end of the log file. This WARN entry is not
expected to be a fatal error and would go away after some time, and should
not hinder the apps. So my question is 1) did you see this WARN repeating
forever and 2) how long have you observed that the app is stuck, and while
it is stuck does the above entry never go away?


Guozhang


On Wed, May 3, 2017 at 10:50 PM, Sameer Kumar <sam.kum.work@gmail.com>
wrote:

> My brokers are on version 10.1.0 and my clients are on version 10.2.0.
> Also, do a reply to all, I am currently not subscribed to the mailing list.
>
> -Sameer.
>
> On Wed, May 3, 2017 at 5:27 PM, Sameer Kumar <sam.kum.work@gmail.com>
> wrote:
>
> > Hi,
> >
> >
> >
> > I want to report an issue where in addition of a server at runtime in my
> > streams compute cluster caused errors and subsequent complete halting of
> > the cluster. I am not sure if this is the actual issue, but this was
> > something I did differently while 18 hour smooth run of the streams app.
> >
> >
> >
> > Initially, I had one machine working on my Kafka topic, which contains
> > impressions and clicks. The job was running overnight, in the morning I
> > just added another machine to the cluster and this is when every time
> stuck
> > after working fine for some time.
> >
> >
> >
> > Please find the kafka_log_snippet and poc_log_snippet attached.
> >
> >
> >
> > Thereafter, failing of these nodes, I tried to restart just one machine
> on
> > my compute cluster to see if it can initialize itself.
> >
> > Please the logs attached for the same as well. Following were the logs I
> > saw quite often.
> >
> >
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-38 at offset 556717 since the current
> > position is 557065
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-38] to broker 172.29.65.190:9092 (id: 0
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-48 at offset 607657 since the current
> > position is 607880
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-48] to broker 172.29.65.192:9092 (id: 2
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-31 at offset 282265 since the current
> > position is 282327
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-31] to broker 172.29.65.191:9092 (id: 1
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-3 at offset 499952 since the current
> position
> > is 500324
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-3] to broker 172.29.65.192:9092 (id: 2
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-21 at offset 587018 since the current
> > position is 587227
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-21] to broker 172.29.65.192:9092 (id: 2
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-49 at offset 276209 since the current
> > position is 276271
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-49] to broker 172.29.65.191:9092 (id: 1
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-16 at offset 592727 since the current
> > position is 592896
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-16] to broker 172.29.65.191:9092 (id: 1
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-37 at offset 458224 since the current
> > position is 458343
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-37] to broker 172.29.65.191:9092 (id: 1
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-59 at offset 495722 since the current
> > position is 496113
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-59] to broker 172.29.65.190:9092 (id: 0
> > rack: null)
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > LIC2-4-licountci-4-changelog-35 at offset 230310 since the current
> > position is 231236
> >
> > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > [LIC2-4-licountci-4-changelog-35] to broker 172.29.65.190:9092 (id: 0
> > rack: null)
> >
> >
> >
> > Regards,
> >
> > -Sameer.
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message