kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eno Thereska <eno.there...@gmail.com>
Subject Re: Kafka Streams Failed to rebalance error
Date Wed, 03 May 2017 15:42:10 GMT
Hi,

Which version of Kafka are you using? This should be fixed in 0.10.2.1, any chance you could
try that release?

Thanks
Eno
> On 3 May 2017, at 14:04, Sameer Kumar <sam.kum.work@gmail.com> wrote:
> 
> Hi,
> 
>  
> I ran two nodes in my streams compute cluster, they were running fine for few hours before
outputting with failure to rebalance errors.
> 
> 
> 
> I couldnt understand why this happened but I saw one strange behaviour...
> 
> at 16:53 on node1, I saw "Failed to lock the state directory" error, this might have
caused the partitions to relocate and hence the error.
> 
>  
> I am attaching detailed logs for both the nodes, please see if you can help.
> 
>  
> Some of the logs for quick reference are these.
> 
>  
> 2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread StreamThread-2
> 
> org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-2] Failed
to rebalance
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:612)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> 
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-2]
failed to suspend stream tasks
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:488)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.access$1200(StreamThread.java:69)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsRevoked(StreamThread.java:259)
> 
>                 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:396)
> 
>                 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:329)
> 
>                 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
> 
>                 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
> 
>                 at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
> 
>                 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
> 
>                 ... 1 more
> 
> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured
max.poll.interval.ms <http://max.poll.interval.ms/>, which typically implies that the
poll loop is spending too much time message processing. You can address this either by increasing
the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
> 
>                 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:698)
> 
>                 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:577)
> 
>                 at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1125)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:296)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread$3.apply(StreamThread.java:535)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:503)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.commitOffsets(StreamThread.java:531)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:480)
> 
>                 ... 10 more
> 
>  
> 2017-05-03 16:53:57 WARN  StreamThread:1184 - Could not create task 1_38. Will retry.
> 
> org.apache.kafka.streams.errors.LockException: task [1_38] Failed to lock the state directory:
/data/streampoc/LIC2-5/1_38
> 
>                 at org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:102)
> 
>                 at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:73)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:108)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
> 
>                 at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
> 
> 
> 
> Regards,
> 
> -Sameer.
> 
> <node2.zip><node1.zip>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message