kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Gray <Alex.G...@inin.com>
Subject Re: Exception on Startup. Is it bad or benign.
Date Wed, 09 Apr 2014 13:02:45 GMT
Thanks Joel and Guozhang!
The data retention is 72 hours.
Graceful shutdown is done via SIGTERM, and 
controlled.shutdown.enabled=true is in the config.
I do see 'Controlled shutdown succeeded' in the broker log when I shut 
it down.

With both your responses, I feel as if brokers are indeed setup and 
functioning correctly.

I want to ask the developers if I can run a write a script that 
gracefully restarts each broker randomly throughout the entire day, 
24/7 :)

That should weed out any issues.

Thanks guys,

Alex


On Tue Apr  8 20:38:15 2014, Joel Koshy wrote:
> Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
> you have controlled.shutdown.enable=true in the broker config. If that
> is set and the controlled shutdown succeeds (i.e., if you see
> 'Controlled shutdown succeeded' in the broker log) then you shouldn't
> be seeing the data loss warning in your controller log during the
> shutdown and restarts. Or are you seeing it at other times as well?
>
> WRT the OffsetOutOfRangeException: is your broker down for a long
> period? Do you have a very low retention setting for your topics? Or
> are you bringing up a consumer that has been down for a long period?
>
> Thanks,
>
> Joel
>
> On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
>> Hi Alex,
>>
>> 1. There is no "cool-off" time since the rebalance should be done before
>> the server complete shutdown.
>>
>> 2. The logs are indicating there is possible data loss, which is "expected"
>> if your producer's required.ack config is <= 1 but not == -1. If you do not
>> want data loss, you can change that config value in your producer clients
>> to be > 1, which will effectively trade some latency and availability for
>> consistency.
>>
>> Guozhang
>>
>>
>> On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <Alex.Gray@inin.com> wrote:
>>
>>> We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
>>>
>>> I gracefully shutdown one of the kafka brokers.
>>>
>>> Question 1:  Should I wait some time before starting the broker back up,
>>> or can I restart it as soon as possible?  In other words, do I have to wait
>>> for the other brokers to "re-balance (or whatever they do)" before starting
>>> it back up?
>>>
>>> Question 2: Every once in a while, I get the following exception when the
>>> kafka broker is starting up.  Is this bad?  Searching around the
>>> newsgroups, I could not get a definitive answer. Example:
>>> http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
>>> offsetoutofrangeexceptions
>>> http://grokbase.com/t/kafka/users/1413hp296y/trouble-
>>> recovering-after-a-crashed-broker
>>>
>>> Here is the exception:
>>> [2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
>>> request for partition [KeyPairGenerated,0] offset 514 from consumer with
>>> correlation id 85 (kafka.server.KafkaApis)
>>> kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
>>> only have log segments in the range 0 to 0.
>>>      at kafka.log.Log.read(Log.scala:429)
>>>      at kafka.server.KafkaApis.kafka$server$KafkaApis$$
>>> readMessageSet(KafkaApis.scala:388)
>>>      at kafka.server.KafkaApis$$anonfun$kafka$server$
>>> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
>>>      at kafka.server.KafkaApis$$anonfun$kafka$server$
>>> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
>>>      at scala.collection.TraversableLike$$anonfun$map$
>>> 1.apply(TraversableLike.scala:206)
>>>      at scala.collection.TraversableLike$$anonfun$map$
>>> 1.apply(TraversableLike.scala:206)
>>>      at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
>>>      at scala.collection.TraversableLike$class.map(
>>> TraversableLike.scala:206)
>>>      at scala.collection.immutable.Map$Map1.map(Map.scala:93)
>>>      at kafka.server.KafkaApis.kafka$server$KafkaApis$$
>>> readMessageSets(KafkaApis.scala:330)
>>>      at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
>>>      at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
>>>      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
>>>      at java.lang.Thread.run(Thread.java:722)
>>>
>>> And in the controller.log, I see every once in a while something like:
>>>
>>> controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
>>> OfflinePartitionLeaderSelector]: No broker in ISR is alive for
>>> [KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
>>> data loss. (kafka.controller.OfflinePartitionLeaderSelector)
>>>
>>> (Which I did via: grep "data loss" *)
>>>
>>> I'm not a programmer: I am the admin for these machines, and I just want
>>> to make sure everything is cool.
>>> Oh, the server.properties has:
>>> default.replication.factor=3
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>
>>
>>
>> --
>> -- Guozhang
>

--
*Alex Gray* | DevOps Engineer, PureCloud
Phone +1.317.493.4291 | mobile +1.857.636.2810
*Interactive Intelligence*
Deliberately Innovative
www.inin.com <http://www.inin.com/>


Mime
View raw message