spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Goldenberg <dgoldenberg...@gmail.com>
Subject Re: Kafka error "partitions don't have a leader" / LeaderNotAvailableException
Date Tue, 29 Sep 2015 14:03:27 GMT
Thanks, Cody.

Yes we did see that writeup from Jay, it seems to just refer to his test 6
partitions.  I've been looking for more of a recipe of what the possible
max is vs. what the optimal value may be; haven't found such.

KAFKA-899 appears related but it was fixed in Kafka 0.8.2.0 - we're running
0.8.2.1.

I'm more curious about another error message from the logs which is this:

*fetching topic metadata for topics [Set(my-topic-1)] from broker
[ArrayBuffer(id:0,host:data2.acme.com <http://data2.acme.com>,port:9092,
id:1,host:data3.acme.com <http://data3.acme.com>,port:9092)] failed*

I know that data2 should have broker ID of 1 and data3 should have broker
ID of 2.  So there's some disconnect somewhere as to what these ID's are.
In Zookeeper, ls /brokers/ids lists: [1, 2].  So where could the [0, 1] be
stuck?



On Tue, Sep 29, 2015 at 9:39 AM, Cody Koeninger <cody@koeninger.org> wrote:

> Try writing and reading to the topics in question using the kafka command
> line tools, to eliminate your code as a variable.
>
>
> That number of partitions is probably more than sufficient:
>
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
>
> Obviously if you ask for more replicas than you have brokers you're going
> to have a problem, but that doesn't seem to be the case.
>
>
>
> Also, depending on what version of kafka you're using on the broker, you
> may want to look through the kafka jira, e.g.
>
> https://issues.apache.org/jira/browse/KAFKA-899
>
>
> On Tue, Sep 29, 2015 at 8:05 AM, Dmitry Goldenberg <
> dgoldenberg123@gmail.com> wrote:
>
>> "more partitions and replicas than available brokers" -- what would be a
>> good ratio?
>>
>> We've been trying to set up 3 topics with 64 partitions.  I'm including
>> the output of "bin/kafka-topics.sh --zookeeper localhost:2181 --describe
>> topic1" below.
>>
>> I think it's symptomatic and confirms your theory, Adrian, that we've got
>> too many partitions. In fact, for topic 2, only 12 partitions appear to
>> have been created despite the requested 64.  Does Kafka have the limit of
>> 140 partitions total within a cluster?
>>
>> The doc doesn't appear to have any prescriptions as to how you go about
>> calculating an optimal number of partitions.
>>
>> We'll definitely try with fewer, I'm just looking for a good formula to
>> calculate how many. And no, Adrian, this hasn't worked yet, so we'll start
>> with something like 12 partitions.  It'd be good to know how high we can go
>> with that...
>>
>> Topic:topic1 PartitionCount:64 ReplicationFactor:1 Configs:
>>
>> Topic: topic1 Partition: 0 Leader: 1 Replicas: 1 Isr: 1
>>
>> Topic: topic2 Partition: 1 Leader: 2 Replicas: 2 Isr: 2
>>
>>
>> ................................................................................................
>>
>> Topic: topic3 Partition: 63 Leader: 2 Replicas: 2 Isr: 2
>>
>>
>> ---------------------------------------------------------------------------------------------------
>>
>> Topic:topic2 PartitionCount:12 ReplicationFactor:1 Configs:
>>
>> Topic: topic2 Partition: 0 Leader: 2 Replicas: 2 Isr: 2
>>
>> Topic: topic2 Partition: 1 Leader: 1 Replicas: 1 Isr: 1
>>
>>
>> ................................................................................................
>>
>> Topic: topic2 Partition: 11 Leader: 1 Replicas: 1 Isr: 1
>>
>>
>> ---------------------------------------------------------------------------------------------------
>>
>> Topic:topic3 PartitionCount:64 ReplicationFactor:1 Configs:
>>
>> Topic: topic3 Partition: 0 Leader: 2 Replicas: 2 Isr: 2
>>
>> Topic: topic3 Partition: 1 Leader: 1 Replicas: 1 Isr: 1
>>
>>
>> ................................................................................................
>>
>> Topic: topic3 Partition: 63 Leader: 1 Replicas: 1 Isr: 1
>>
>>
>> On Tue, Sep 29, 2015 at 8:47 AM, Adrian Tanase <atanase@adobe.com> wrote:
>>
>>> The error message is very explicit (partition is under replicated), I
>>> don’t think it’s related to networking issues.
>>>
>>> Try to run /home/kafka/bin/kafka-topics.sh —zookeeper localhost/kafka
>>> —describe topic_name and see which brokers are missing from the replica
>>> assignment.
>>> *(replace home, zk-quorum etc with your own set-up)*
>>>
>>> Lastly, has this ever worked? Maybe you’ve accidentally created the
>>> topic with more partitions and replicas than available brokers… try to
>>> recreate with fewer partitions/replicas, see if it works.
>>>
>>> -adrian
>>>
>>> From: Dmitry Goldenberg
>>> Date: Tuesday, September 29, 2015 at 3:37 PM
>>> To: Adrian Tanase
>>> Cc: "user@spark.apache.org"
>>> Subject: Re: Kafka error "partitions don't have a leader" /
>>> LeaderNotAvailableException
>>>
>>> Adrian,
>>>
>>> Thanks for your response. I just looked at both machines we're testing
>>> on and on both the Kafka server process looks OK. Anything specific I can
>>> check otherwise?
>>>
>>> From googling around, I see some posts where folks suggest to check the
>>> DNS settings (those appear fine) and to set the advertised.host.name in
>>> Kafka's server.properties. Yay/nay?
>>>
>>> Thanks again.
>>>
>>> On Tue, Sep 29, 2015 at 8:31 AM, Adrian Tanase <atanase@adobe.com>
>>> wrote:
>>>
>>>> I believe some of the brokers in your cluster died and there are a
>>>> number of partitions that nobody is currently managing.
>>>>
>>>> -adrian
>>>>
>>>> From: Dmitry Goldenberg
>>>> Date: Tuesday, September 29, 2015 at 3:26 PM
>>>> To: "user@spark.apache.org"
>>>> Subject: Kafka error "partitions don't have a leader" /
>>>> LeaderNotAvailableException
>>>>
>>>> I apologize for posting this Kafka related issue into the Spark list.
>>>> Have gotten no responses on the Kafka list and was hoping someone on this
>>>> list could shed some light on the below.
>>>>
>>>> ------------------------------------------------------------
>>>> ---------------------------
>>>>
>>>> We're running into this issue in a clustered environment where we're
>>>> trying to send messages to Kafka and are getting the below error.
>>>>
>>>> Can someone explain what might be causing it and what the error message
>>>> means (Failed to send data since partitions [<topic-name>,8] don't
have a
>>>> leader) ?
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>>
>>>> WARN kafka.producer.BrokerPartitionInfo: Error while fetching
>>>> metadata partition 10 leader: none replicas: isr: isUnderReplicated: false
>>>> for topic partition [<topic-name>,10]: [class
>>>> kafka.common.LeaderNotAvailableException]
>>>>
>>>> ERROR kafka.producer.async.DefaultEventHandler: Failed to send requests
>>>> for topics <topic-name> with correlation ids in [2398792,2398801]
>>>>
>>>> ERROR com.acme.core.messaging.kafka.KafkaMessageProducer: Error while
>>>> sending a message to the message
>>>> store. kafka.common.FailedToSendMessageException: Failed to send messages
>>>> after 3 tries.
>>>> at
>>>> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>>> at kafka.producer.Producer.send(Producer.scala:77)
>>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>>> at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>>>
>>>> WARN kafka.producer.async.DefaultEventHandler: Failed to send data
>>>> since partitions [<topic-name>,8] don't have a leader
>>>>
>>>> What do these errors and warnings mean and how do we get around them?
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>>
>>>> The code for sending messages is basically as follows:
>>>>
>>>> public class KafkaMessageProducer {
>>>> private Producer<String, String> producer;
>>>>
>>>> .....................
>>>>
>>>> public void sendMessage(String topic, String key,
>>>> String message) throws IOException, MessagingException {
>>>>     KeyedMessage<String, String> data = new KeyedMessage<String,
>>>> String>(topic, key, message);
>>>>     try {
>>>>       producer.send(data);
>>>>     } catch (Exception ex) {
>>>>       throw new MessagingException("Error while sending a message to
>>>> the message store.", ex);
>>>>     }
>>>> }
>>>>
>>>> Is it possible that the producer gets "stale" and needs to be
>>>> re-initialized?  Do we want to re-create the producer on every message (??)
>>>> or is it OK to hold on to one indefinitely?
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>>
>>>> The following are the producer properties that are being set into the
>>>> producer
>>>>
>>>> batch.num.messages => 200
>>>> client.id => Acme
>>>> compression.codec => none
>>>> key.serializer.class => kafka.serializer.StringEncoder
>>>> message.send.max.retries => 3
>>>> metadata.broker.list => data2.acme.com:9092,data3.acme.com:9092
>>>> partitioner.class => kafka.producer.DefaultPartitioner
>>>> producer.type => sync
>>>> queue.buffering.max.messages => 10000
>>>> queue.buffering.max.ms => 5000
>>>> queue.enqueue.timeout.ms => -1
>>>> request.required.acks => 1
>>>> request.timeout.ms => 10000
>>>> retry.backoff.ms => 1000
>>>> send.buffer.bytes => 102400
>>>> serializer.class => kafka.serializer.StringEncoder
>>>> topic.metadata.refresh.interval.ms => 600000
>>>>
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>

Mime
View raw message