spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Kafka error "partitions don't have a leader" / LeaderNotAvailableException
Date Tue, 29 Sep 2015 13:39:51 GMT
Try writing and reading to the topics in question using the kafka command
line tools, to eliminate your code as a variable.


That number of partitions is probably more than sufficient:

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Obviously if you ask for more replicas than you have brokers you're going
to have a problem, but that doesn't seem to be the case.



Also, depending on what version of kafka you're using on the broker, you
may want to look through the kafka jira, e.g.

https://issues.apache.org/jira/browse/KAFKA-899


On Tue, Sep 29, 2015 at 8:05 AM, Dmitry Goldenberg <dgoldenberg123@gmail.com
> wrote:

> "more partitions and replicas than available brokers" -- what would be a
> good ratio?
>
> We've been trying to set up 3 topics with 64 partitions.  I'm including
> the output of "bin/kafka-topics.sh --zookeeper localhost:2181 --describe
> topic1" below.
>
> I think it's symptomatic and confirms your theory, Adrian, that we've got
> too many partitions. In fact, for topic 2, only 12 partitions appear to
> have been created despite the requested 64.  Does Kafka have the limit of
> 140 partitions total within a cluster?
>
> The doc doesn't appear to have any prescriptions as to how you go about
> calculating an optimal number of partitions.
>
> We'll definitely try with fewer, I'm just looking for a good formula to
> calculate how many. And no, Adrian, this hasn't worked yet, so we'll start
> with something like 12 partitions.  It'd be good to know how high we can go
> with that...
>
> Topic:topic1 PartitionCount:64 ReplicationFactor:1 Configs:
>
> Topic: topic1 Partition: 0 Leader: 1 Replicas: 1 Isr: 1
>
> Topic: topic2 Partition: 1 Leader: 2 Replicas: 2 Isr: 2
>
>
> ................................................................................................
>
> Topic: topic3 Partition: 63 Leader: 2 Replicas: 2 Isr: 2
>
>
> ---------------------------------------------------------------------------------------------------
>
> Topic:topic2 PartitionCount:12 ReplicationFactor:1 Configs:
>
> Topic: topic2 Partition: 0 Leader: 2 Replicas: 2 Isr: 2
>
> Topic: topic2 Partition: 1 Leader: 1 Replicas: 1 Isr: 1
>
>
> ................................................................................................
>
> Topic: topic2 Partition: 11 Leader: 1 Replicas: 1 Isr: 1
>
>
> ---------------------------------------------------------------------------------------------------
>
> Topic:topic3 PartitionCount:64 ReplicationFactor:1 Configs:
>
> Topic: topic3 Partition: 0 Leader: 2 Replicas: 2 Isr: 2
>
> Topic: topic3 Partition: 1 Leader: 1 Replicas: 1 Isr: 1
>
>
> ................................................................................................
>
> Topic: topic3 Partition: 63 Leader: 1 Replicas: 1 Isr: 1
>
>
> On Tue, Sep 29, 2015 at 8:47 AM, Adrian Tanase <atanase@adobe.com> wrote:
>
>> The error message is very explicit (partition is under replicated), I
>> don’t think it’s related to networking issues.
>>
>> Try to run /home/kafka/bin/kafka-topics.sh —zookeeper localhost/kafka
>> —describe topic_name and see which brokers are missing from the replica
>> assignment.
>> *(replace home, zk-quorum etc with your own set-up)*
>>
>> Lastly, has this ever worked? Maybe you’ve accidentally created the topic
>> with more partitions and replicas than available brokers… try to recreate
>> with fewer partitions/replicas, see if it works.
>>
>> -adrian
>>
>> From: Dmitry Goldenberg
>> Date: Tuesday, September 29, 2015 at 3:37 PM
>> To: Adrian Tanase
>> Cc: "user@spark.apache.org"
>> Subject: Re: Kafka error "partitions don't have a leader" /
>> LeaderNotAvailableException
>>
>> Adrian,
>>
>> Thanks for your response. I just looked at both machines we're testing on
>> and on both the Kafka server process looks OK. Anything specific I can
>> check otherwise?
>>
>> From googling around, I see some posts where folks suggest to check the
>> DNS settings (those appear fine) and to set the advertised.host.name in
>> Kafka's server.properties. Yay/nay?
>>
>> Thanks again.
>>
>> On Tue, Sep 29, 2015 at 8:31 AM, Adrian Tanase <atanase@adobe.com> wrote:
>>
>>> I believe some of the brokers in your cluster died and there are a
>>> number of partitions that nobody is currently managing.
>>>
>>> -adrian
>>>
>>> From: Dmitry Goldenberg
>>> Date: Tuesday, September 29, 2015 at 3:26 PM
>>> To: "user@spark.apache.org"
>>> Subject: Kafka error "partitions don't have a leader" /
>>> LeaderNotAvailableException
>>>
>>> I apologize for posting this Kafka related issue into the Spark list.
>>> Have gotten no responses on the Kafka list and was hoping someone on this
>>> list could shed some light on the below.
>>>
>>> ------------------------------------------------------------
>>> ---------------------------
>>>
>>> We're running into this issue in a clustered environment where we're
>>> trying to send messages to Kafka and are getting the below error.
>>>
>>> Can someone explain what might be causing it and what the error message
>>> means (Failed to send data since partitions [<topic-name>,8] don't have
a
>>> leader) ?
>>>
>>>
>>> ---------------------------------------------------------------------------------------
>>>
>>> WARN kafka.producer.BrokerPartitionInfo: Error while fetching
>>> metadata partition 10 leader: none replicas: isr: isUnderReplicated: false
>>> for topic partition [<topic-name>,10]: [class
>>> kafka.common.LeaderNotAvailableException]
>>>
>>> ERROR kafka.producer.async.DefaultEventHandler: Failed to send requests
>>> for topics <topic-name> with correlation ids in [2398792,2398801]
>>>
>>> ERROR com.acme.core.messaging.kafka.KafkaMessageProducer: Error while
>>> sending a message to the message
>>> store. kafka.common.FailedToSendMessageException: Failed to send messages
>>> after 3 tries.
>>> at
>>> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>> at kafka.producer.Producer.send(Producer.scala:77)
>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>> at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>>> ~[kafka_2.10-0.8.2.0.jar:?]
>>>
>>> WARN kafka.producer.async.DefaultEventHandler: Failed to send data since
>>> partitions [<topic-name>,8] don't have a leader
>>>
>>> What do these errors and warnings mean and how do we get around them?
>>>
>>>
>>> ---------------------------------------------------------------------------------------
>>>
>>> The code for sending messages is basically as follows:
>>>
>>> public class KafkaMessageProducer {
>>> private Producer<String, String> producer;
>>>
>>> .....................
>>>
>>> public void sendMessage(String topic, String key,
>>> String message) throws IOException, MessagingException {
>>>     KeyedMessage<String, String> data = new KeyedMessage<String,
>>> String>(topic, key, message);
>>>     try {
>>>       producer.send(data);
>>>     } catch (Exception ex) {
>>>       throw new MessagingException("Error while sending a message to the
>>> message store.", ex);
>>>     }
>>> }
>>>
>>> Is it possible that the producer gets "stale" and needs to be
>>> re-initialized?  Do we want to re-create the producer on every message (??)
>>> or is it OK to hold on to one indefinitely?
>>>
>>>
>>> ---------------------------------------------------------------------------------------
>>>
>>> The following are the producer properties that are being set into the
>>> producer
>>>
>>> batch.num.messages => 200
>>> client.id => Acme
>>> compression.codec => none
>>> key.serializer.class => kafka.serializer.StringEncoder
>>> message.send.max.retries => 3
>>> metadata.broker.list => data2.acme.com:9092,data3.acme.com:9092
>>> partitioner.class => kafka.producer.DefaultPartitioner
>>> producer.type => sync
>>> queue.buffering.max.messages => 10000
>>> queue.buffering.max.ms => 5000
>>> queue.enqueue.timeout.ms => -1
>>> request.required.acks => 1
>>> request.timeout.ms => 10000
>>> retry.backoff.ms => 1000
>>> send.buffer.bytes => 102400
>>> serializer.class => kafka.serializer.StringEncoder
>>> topic.metadata.refresh.interval.ms => 600000
>>>
>>>
>>> Thanks.
>>>
>>
>>
>

Mime
View raw message