kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Kafka 0.8.1.1 replication issues
Date Tue, 04 Nov 2014 19:35:35 GMT
This seems to be related to https://issues.apache.org/jira/browse/KAFKA-1749
.

Guozhang

On Tue, Nov 4, 2014 at 10:30 AM, Christofer Hedbrandh <
christofer@knewton.com> wrote:

> Hi Kafka users!
>
> I was just migrating a cluster of 3 brokers from one set of EC2 instances
> to another, but ran into replication problems. The method of migration used
> is that of stopping one broker and letting a new broker join with the same
> broker.id. Replication started, but after ~4 of ~15 GB the process stopped
> with the following errors getting logged every ~500ms.
>
>
> On the new broker (the fetcher):
>
> [2014-11-04 17:02:33,762] ERROR [ReplicaFetcherThread-0-1926078608], Error
> in fetch Name: FetchRequest; Version: 0; CorrelationId: 1523; ClientId:
> ReplicaFetcherThread-0-1926078608; ReplicaId: 544181083; MaxWait: 500 ms;
> MinBytes: 1 bytes; RequestInfo: [qa.mx-error,302] ->
> PartitionFetchInfo(0,10485760),[qa.xl-msg,46] ->
> PartitionFetchInfo(101768,10485760),[qa.xl-error,202] ->
> PartitionFetchInfo(0,10485760),[qa.mx-msg,177] ->
> ... total of 700+ partitions
> -> PartitionFetchInfo(0,10485760) (kafka.server.ReplicaFetcherThread)
> java.io.EOFException: Received -1 when reading from channel, socket has
> likely been closed.
>         at kafka.utils.Utils$.read(Utils.scala:376)
>         at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at
> kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at
>
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
>         at
>
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
>         at
>
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> [2014-11-04 17:02:33,765] WARN Reconnect due to socket error: null
> (kafka.consumer.SimpleConsumer)
>
>
> On one of the two old nodes (presumably the broker providing the data)
>
> [2014-11-04 17:03:28,030] ERROR Closing socket for /10.145.135.246 because
> of error (kafka.network.Processor)
> kafka.common.KafkaException: This operation cannot be completed on a
> complete request.
> at kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34)
> at kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191)
> at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214)
> at kafka.network.Processor.write(SocketServer.scala:375)
> at kafka.network.Processor.run(SocketServer.scala:247)
> at java.lang.Thread.run(Thread.java:745)
>
>
> It looks similar to this previous post, but the thread doesn't seem to have
> a resolution to the problem.
> http://thread.gmane.org/gmane.comp.apache.kafka.user/1153
>
> There is also this one, but again no resolution.
> http://thread.gmane.org/gmane.comp.apache.kafka.user/3804
>
>
> Does anyone have any clues as to what might be going on here? And any
> suggestions for solutions?
>
> Thanks,
> Christofer
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message