kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saiprasad mishra <saiprasadmis...@gmail.com>
Subject Re: Kafka Streams fails permanently when used with an unstable network
Date Fri, 04 Nov 2016 07:18:04 GMT
Hi Eno
Thanks for the JIRA info
The change looks worth trying.Will let you know after i try it out.

Regards
Sai


On Wed, Nov 2, 2016 at 1:33 PM, Eno Thereska <eno.thereska@gmail.com> wrote:

> Hi Sai,
>
> For your second note on rebalancing taking a long time, we have just
> improved the situation in trunk after fixing this JIRA:
> https://issues.apache.org/jira/browse/KAFKA-3559 <
> https://issues.apache.org/jira/browse/KAFKA-3559>. Feel free to give it a
> go if rebalancing time continues to be a problem.
>
> Thanks
> Eno
>
> > On 31 Oct 2016, at 19:44, saiprasad mishra <saiprasadmishra@gmail.com>
> wrote:
> >
> > Hey Guys
> >
> > I have noticed similar issues when network goes down on starting of kafka
> > stream apps especially the store has initialized but the task
> > initialization is not complete and when the network comes back the
> > rebalance fails with the above error and I had to restart. as i run many
> > partitions and have many tasks get initialized.
> >
> > Otherwise if the kafka streams app is started successfully does recover
> > from network issues always as far as what I have seen so far and also
> > stores do remain available.
> >
> > Which means some of these initialization exceptions can be categorized as
> > recoverable and should be always retried.
> >
> > I think task 0_0 in your case was not initialized properly in the first
> > place and then rebalance happened bcoz of network connectivity and it
> > resulted in the above exception.
> >
> > On a separate note rebalance takes longer time  as i have some
> > intermeidiary topics and thinking it might be worse if network is slow
> and
> > was thinking of something like store may be available for querying
> quickly
> > without waiting for the full initialization of tasks
> >
> > Regards
> > Sai
> >
> >
> >
> >
> >
> >
> > Regards
> > Sai
> >
> > On Mon, Oct 31, 2016 at 3:51 AM, Damian Guy <damian.guy@gmail.com>
> wrote:
> >
> >> Hi Frank,
> >>
> >> This usually means that another StreamThread has the lock for the state
> >> directory. So it would seem that one of the StreamThreads hasn't shut
> down
> >> cleanly. If it happens again can you please take a Thread Dump so we can
> >> see what is happening?
> >>
> >> Thanks,
> >> Damian
> >>
> >> On Sun, 30 Oct 2016 at 10:52 Frank Lyaruu <flyaruu@gmail.com> wrote:
> >>
> >>> I have a remote Kafka cluster, to which I connect using a VPN and a
> >>> not-so-great WiFi network.
> >>> That means that sometimes the Kafka Client loses briefly loses
> >>> connectivity.
> >>> When it regains a connection after a while, I see:
> >>>
> >>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
> >> be
> >>> completed since the group has already rebalanced and assigned the
> >>> partitions to another member. This means that the time between
> subsequent
> >>> calls to poll() was longer than the configured max.poll.interval.ms,
> >> which
> >>> typically implies that the poll loop is spending too much time message
> >>> processing. You can address this either by increasing the session
> timeout
> >>> or by reducing the maximum size of batches returned in poll() with
> >>> max.poll.records.
> >>>
> >>> ...
> >>>
> >>> Which makes sense I suppose, but this shouldn't be fatal.
> >>>
> >>> But then I see:
> >>> [StreamThread-1] ERROR
> >>> org.apache.kafka.streams.processor.internals.StreamThread -
> >> stream-thread
> >>> [StreamThread-1] Failed to create an active task %s:
> >>> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0]
> >> Error
> >>> while creating the state manager
> >>>
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> >> AbstractTask.java:72)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamTask.<init>(StreamTask.java:89)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.createStreamTask(StreamThread.java:633)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.addStreamTasks(StreamThread.java:660)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.StreamThread.access$100(
> >> StreamThread.java:69)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.StreamThread$1.
> >> onPartitionsAssigned(StreamThread.java:124)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> >> onJoinComplete(ConsumerCoordinator.java:228)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> >> joinGroupIfNeeded(AbstractCoordinator.java:313)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> >> ensureActiveGroup(AbstractCoordinator.java:277)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> >> ConsumerCoordinator.java:259)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.KafkaConsumer.
> >> pollOnce(KafkaConsumer.java:1013)
> >>> at
> >>>
> >>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> >> KafkaConsumer.java:979)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> >> StreamThread.java:407)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.run(StreamThread.java:242)
> >>>
> >>> Caused by: java.io.IOException: task [0_0] Failed to lock the state
> >>> directory:
> >>>
> >>> /Users/frank/git/dexels.repository/com.dexels.kafka.
> >> streams/kafka-streams/develop3-person/0_0
> >>>
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.
> >> ProcessorStateManager.<init>(ProcessorStateManager.java:101)
> >>> at
> >>>
> >>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> >> AbstractTask.java:69)
> >>>
> >>> ... 13 more
> >>>
> >>> And my stream applications is dead.
> >>>
> >>> So I'm guessing that either the store wasn't closed properly or some
> >> things
> >>> happen out of order.
> >>>
> >>> Any ideas?
> >>>
> >>> I'm using the trunk of Kafka 0.10.2.0-SNAPSHOT, Java 1.8.0_66 on MacOS
> >>> 10.11.6
> >>>
> >>> regards, Frank
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message