samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: tkafka.common.ReplicaNotAvailableException on application logs
Date Thu, 14 May 2015 17:38:03 GMT
Hi Shekar,

It seems the incoming / outgoing topics are not the root of the problem
here, but the checkpoint topic "__samza_checkpoint_ver_1_for_Argos". From
the error logs this topic only has one replica 1018019532, which was down
and hence not available.

Guozhang

On Thu, May 14, 2015 at 5:16 AM, Shekar Tippur <ctippur@gmail.com> wrote:

> Here is what I see on Kafka log:
>
> [2015-05-14 04:11:27,752] ERROR Closing socket for /10.180.195.32 because
> of error (kafka.network.Processor)
>
> java.io.IOException: Connection reset by peer
>
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>
>         at kafka.utils.Utils$.read(Utils.scala:375)
>
>         at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>
>         at kafka.network.Processor.read(SocketServer.scala:347)
>
>         at kafka.network.Processor.run(SocketServer.scala:245)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> [2015-05-14 04:11:27,753] INFO Closing socket connection to /10.180.195.32
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:06,537] INFO Closing socket connection to /10.180.195.32
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:06,604] INFO Closing socket connection to /10.180.195.32
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:32,370] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:32,452] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:32,810] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:16:32,931] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:36:40,586] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:39:49,016] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:43:38,166] INFO Closing socket connection to /10.180.195.32
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:43:38,392] INFO [ReplicaFetcherManager on broker 1018019533]
> Removed fetcher for partitions [argos-parser,0],[argos-raw,0]
> (kafka.server.ReplicaFetcherManager)
>
> [2015-05-14 04:43:40,746] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:43:40,855] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> [2015-05-14 04:43:40,957] INFO Closing socket connection to /10.180.195.33
> .
> (kafka.network.Processor)
>
> On Thu, May 14, 2015 at 4:55 AM, Shekar Tippur <ctippur@gmail.com> wrote:
>
> > Here is the complete log:
> >
> > http://pastebin.com/nX7twETm
> >
> > Interesting, I see a leader not available exception instead of the
> earlier
> > one.
> >
> > ./container_1431601903660_0001_01_000002/samza-container-0.log:2015-05-14
> > 04:53:41 BrokerPartitionInfo [WARN] Error while fetching metadata
> partition
> > 0 leader: none replicas: 1018019532 (sprdargas402.corp.intuit.net:6667)
> isr:
> > isUnderReplicated: true for topic partition
> > [__samza_checkpoint_ver_1_for_Argos_1,0]: [class
> > kafka.common.LeaderNotAvailableException]
> >
> > - Shekar
> >
> > On Wed, May 13, 2015 at 7:52 PM, Naveen S <navgog8@gmail.com> wrote:
> >
> >> Hey Shekar,
> >> Can you paste the entire stacktrace/log? Where there any other errors ?
> >> On Wed, May 13, 2015 at 6:04 PM Shekar Tippur <ctippur@gmail.com>
> wrote:
> >>
> >> > Hello,
> >> >
> >> > I seem to come across a issue with replication. We have 2 nodes where
> >> Kafka
> >> > and yarn run.
> >> >
> >> > We have enabled replication factor on Kafka (Replication factor = 2).
> >> For
> >> > testing redundancy, we shutdown broker01 server.
> >> > On the yarn application logs, we see the
> >> > exception kafka.common.ReplicaNotAvailableException
> >> >
> >> > Incoming topic:
> >> >
> >> > /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --topic raw
> >> > --describe
> >> >
> >> > Topic:raw PartitionCount:1 ReplicationFactor:2 Configs:
> >> >
> >> > Topic: argos-raw Partition: 0 Leader: 1018019533 Replicas:
> >> > 1018019533,1018019532 Isr: 1018019533,1018019532
> >> >
> >> > Out going topic:
> >> >
> >> >  /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --topic
> >> parser
> >> > --describe
> >> >
> >> > Topic:parser PartitionCount:1 ReplicationFactor:2 Configs:
> >> >
> >> >  Topic: argos-parser Partition: 0 Leader: 1018019533 Replicas:
> >> > 1018019533,1018019532 Isr: 1018019533,1018019532
> >> >
> >> > Any idea on why this could be happening?
> >> >
> >> > - Shekar
> >> >
> >>
> >
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message