kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: Kafka Streams app error while rebalancing
Date Wed, 06 Dec 2017 18:45:50 GMT
Running Streams 1.0.0 should just work with 0.10.2 brokers. Of course,
you can't use EOS feature.

Not sure how your app got into this bad state. But if is does not
recover from it, delete the store directory seems to be reasonable --
also for stateful application, you won't loose data, as we recreate the
stores from the changelog topic. A bug, does not mean not production
ready IMHO. Many people run 0.10.2 in production. But newer versions
should fix bugs of course :)

Please let us know if 1.0.0 works for you.

-Matthias

On 12/6/17 3:18 AM, Srikanth wrote:
> Thanks Matthias. Please find my response inline.
> 
> On Wed, Dec 6, 2017 at 12:34 AM, Matthias J. Sax <matthias@confluent.io>
> wrote:
> 
>> Hard to say.
>>
>> However, deleting state directories will not have any negative impact as
>> you don't use stores. Thus, why do you not want to do this?
>> <Sri> We are running this in a prod-like setup. I don't have SSH access to
>> the box.
> 
> Its not ideal to request deleting these every now and then. It doesn't
>> sound prod ready.
>>
> 
> 
>> Another workaround you can do, it to start four applications with 1
>> thread each -- this would isolate the instances further and avoid the
>> lock issue (you would need to run on different host, or on the same
>> host, but with different state directory for the different instances.)
>> <Sri> App has a large in-mem cache. I'd like more threads to use this
>> cache to keep memory/cpu usage in balance.
> 
> 
> 
> 
>> In general, I would recommend to upgrade you applications -- you don't
>> need to upgrade the brokers for this. 1.0 versions has many additional
>> bug fixes. This should resolve the issues you are facing.
> 
> <Sri> I haven't really tried bidirectional compatibility mode. Is there any
>> caveat I need to be aware of for 1.0.x client and 0.10.2.1 brokers?
> 
> I probably should try this.
> 
> 
> 
>> Otherwise, it's hard to say without logs.
>> <Sri> Unfortunately, that's all I had in the logs.  I know it isn't
>> helpful. I've kept default logging to WARN except for few packages.
> 
> I assume any logging that indicates why app isn't making progress would be
>> in WARN. I'll bump the level up for few days.
> 
> 
> 
>> Hope this helps.
>>
>>
>> -Matthias
>>
>>
>> On 12/5/17 9:35 AM, Srikanth wrote:
>>> Hello,
>>>
>>> We noticed that a kafka streams app is stuck in rebalance state with
>> below
>>> error.
>>> Two instance of the app were running fine until a rebalace was
>>> triggered(possibly due to network issue).
>>> Both app instance are running(no app restart)
>>> App itself doesn't create/use state store. NUM_STREAM_THREADS_CONFIG=2
>>>
>>> I did see several tickets with similar errors that are marked as fixed.
>> I'm
>>> using version 0.10.2.1.
>>>
>>> 17/12/04 18:34:57 WARN StreamThread: Could not create task 0_2. Will
>> retry.
>>> org.apache.kafka.streams.errors.LockException: task [0_2] Failed to lock
>>> the state directory for task 0_2
>>>   at
>>> org.apache.kafka.streams.processor.internals.
>> ProcessorStateManager.<init>(ProcessorStateManager.java:100)
>>>   at
>>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
>> AbstractTask.java:73)
>>>   at
>>> org.apache.kafka.streams.processor.internals.
>> StreamTask.<init>(StreamTask.java:108)
>>>   at
>>> org.apache.kafka.streams.processor.internals.
>> StreamThread.createStreamTask(StreamThread.java:864)
>>>   at
>>> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.
>> createTask(StreamThread.java:1237)
>>>   at
>>> org.apache.kafka.streams.processor.internals.StreamThread$
>> AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>>>   at
>>> org.apache.kafka.streams.processor.internals.
>> StreamThread.addStreamTasks(StreamThread.java:967)
>>>   at
>>> org.apache.kafka.streams.processor.internals.StreamThread.access$600(
>> StreamThread.java:69)
>>>   at
>>> org.apache.kafka.streams.processor.internals.StreamThread$1.
>> onPartitionsAssigned(StreamThread.java:234)
>>>   at
>>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
>> onJoinComplete(ConsumerCoordinator.java:259)
>>>   at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> joinGroupIfNeeded(AbstractCoordinator.java:352)
>>>   at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> ensureActiveGroup(AbstractCoordinator.java:303)
>>>   at
>>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
>> ConsumerCoordinator.java:290)
>>>   at
>>> org.apache.kafka.clients.consumer.KafkaConsumer.
>> pollOnce(KafkaConsumer.java:1029)
>>>   at
>>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(
>> KafkaConsumer.java:995)
>>>   at
>>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:592)
>>>   at
>>> org.apache.kafka.streams.processor.internals.
>> StreamThread.run(StreamThread.java:361)
>>> 17/12/04 18:34:58 WARN StreamThread: Could not create task 0_8. Will
>> retry.
>>> org.apache.kafka.streams.errors.LockException: task [0_8] Failed to lock
>>> the state directory for task 0_8
>>>
>>>
>>> kafka-consumer-groups --new-consumer --bootstrap-server <...> --describe
>>> --group GeoTest
>>> Note: This will only show information about consumers that use the Java
>>> consumer API (non-ZooKeeper-based consumers).
>>>
>>> Warning: Consumer group 'GeoTest' is rebalancing.
>>>
>>> I keep seeing the above lock exception continuously and app is not making
>>> any progress. Any idea why it is stuck?
>>> I read a few suggestions that required me to manually delete state
>>> directory. I'd like to avoid that.
>>>
>>> Thanks,
>>> Srikanth
>>>
>>
>>
> 


Mime
View raw message