kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <...@confluent.io>
Subject Re: error handling with high-level consumer
Date Thu, 05 Feb 2015 05:33:38 GMT
1) Does the corruption happen to console consumer as well? If so, could you
run DumpLogSegment tool to see if the data is corrupted on disk?

Thanks,

Jun


On Wed, Feb 4, 2015 at 9:55 AM, Steven Wu <stevenz3wu@gmail.com> wrote:

> Hi,
>
> We have observed these two exceptions with consumer *iterator.next()*
> recently. want to ask how should we handle them properly.
>
> *1) CRC corruption*
> Message is corrupt (stored crc = 433657556, computed crc = 3265543163)
>
> I assume in this case we should just catch it and move on to the next msg?
> any other iterator/consumer exception we should catch and handle?
>
>
> *2) Unrecoverable consumer erorr with "Iterator is in failed state"*
>
> yesterday, one of our kafka consumers got stuck with very large maxalg and
> was throwing the following exception.
>
> 2015-02-04 08:35:19,841 ERROR KafkaConsumer-0 KafkaConsumer - Exception on
> consuming kafka with topic: <topic_name>
> java.lang.IllegalStateException: Iterator is in failed state
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
> at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
> at kafka.consumer.ConsumerIterator.next(ConsumerIterator.scala:46)
> at com.netflix.suro.input.kafka.KafkaConsumer$1.run(KafkaConsumer.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> we had a surge of traffic of <topic_name>, so I guess the traffic storm
> caused the problem. I tried to restart a few consumer instances but after
> rebalancing, another instance got assigned the problematic partitions and
> got stuck again with the above errors.
>
> We decided to drop messages, stop all consumer instances, reset all offset
> by deleting zk entries and restarted them, the problem went away.
>
> Producer version is kafka_2.8.2-0.8.1.1 with snappy-java-1.0.5
> Consumer version is kafka_2.9.2-0.8.2-beta with snappy-java-1.1.1.6
>
> We googled this issue but this was already fixed long time ago on 0.7.x.
> any idea? is mismatched snappy version the culpit? is it a bug in
> 0.8.2-beta?
>
>
> Thanks,
> Steven
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message