storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsha <st...@harsha.io>
Subject Re: TransactionalTridentKafkaSpout using only 1 executor
Date Fri, 05 Dec 2014 04:59:29 GMT

Using kafka-console-producer is a bad idea. It should only be used for
testing a topic. I highly recommend writing your own producer.
KafkaSpout uses simple level api which doesn't have consumer group . But
you can try using bin/kafka-run-class.sh
kafka.tools.ConsumerOffsetChecker to check the partition size for a
topic.
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-ConsumerOffsetChecker


On Thu, Dec 4, 2014, at 05:50 PM, Andrew Neilson wrote:
> Over the long term the partitions would be used evenly, but unless you
> change the partitioning scheme or message key then at any given time
> only one partition will be receiving *new* messages.
>
> If you want to test that your topology properly distributes the work
> at the spout level, you could try loading from the beginning of your
> topic rather than from the end.
>
> To do that, set these values in your TridentKafkaConfig:
>
> spoutConf.forceFromStart = true; spoutConf.startOffsetTime =
> kafka.api.OffsetRequest.EarliestTime(); // actually the default, so
> you don't necessarily need this line
>
> On Thu, Dec 4, 2014 at 3:28 PM, Huy Le Van
> <huy.levan@insight-centre.org> wrote:
>> __
>>
>> I just dumped from text files directly to kafka producer using
>> bin/kafka-console-producer.sh so I guess the keys were all null. I’ll
>> write a producer to see. By the way, what is the command to show the
>> distribution of my data in kafka?
>>
>>
>>
>> Best regards, Huy, Le Van
>>
>>
>> On Thursday, Dec 4, 2014 at 11:23 p.m., Harsha
>> <storm@harsha.io>, wrote:
>>
>>
>>> It doesnt' look like your kafka producer is distributing data across
>>> the partitions. Whats your producer looks like . Are you sending any
>>> key with each message or using null. If you are using null than what
>>> Andrew is saying might be the problem. I would recommend using
>>> random UUID as a key to send messages to your partition.
>>>
>>>
>>> On Thu, Dec 4, 2014, at 03:12 PM, Huy Le Van wrote:
>>>>
>>>> Hi Harsha, I’ve attached 2 images below. You can see that I
>>>> assigned 16 executors, only one seemed to work. The other
>>>> screenshot is the partition table.
>>>>
>>>> Hi Andrew, That’s an interesting. I’m quite new to Kafka. May you
>>>> take a look at the second screenshot to see if the data was
>>>> distributed evenly? Let’s say it was written to one partition at a
>>>> time (yes, this is the case where I used only one producer), would
>>>> it be rebalanced afterward?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best regards, Huy, Le Van
>>>>
>>>> On Thursday, Dec 4, 2014 at 10:00 p.m., Andrew Neilson
>>>> <arsneilson@gmail.com>, wrote:
>>>>> How is the kafka topic you are reading from partitioned? By
>>>>> default, kafka will write to a single random partition at a time
>>>>> for 10 minutes before switching to another. So if you are looking
>>>>> at live data, you would only see data in one partition at a time
>>>>> unless you use a different partitioning scheme.
>>>>>
>>>>> See the Kafka FAQ for details on this
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?
>>>>>
>>>>>
>>>>> On Thu, Dec 4, 2014 at 1:51 PM, Harsha <storm@harsha.io> wrote:
>>>>>
>>>>>> __
>>>>>> can you post your storm UI executors page image. If there are 16
>>>>>> executors but only 1 seems to have fetching data. Can you please
>>>>>> check on your kafka producer if its distributing your data among
>>>>>> all of your partitions.
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 4, 2014, at 12:32 PM, Huy Le Van wrote:
>>>>>>>
>>>>>>> Could someone help me please?
>>>>>>>
>>>>>>> Best regards, Huy, Le Van
>>>>>>>
>>>>>>> On Thursday, Dec 4, 2014 at 3:35 p.m., Huy Le Van
>>>>>>> <huy.levan@insight-centre.org>, wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I’m trying to tune Kafka Trident (Transactional) and seeing
>>>>>>>> that the ‘spout0’ bolt uses only one executor. The problem
is
>>>>>>>> exactly as described in
>>>>>>>> https://groups.google.com/forum/#!msg/storm-user/bI7976v9R5g/fulzpnPmzkEJ
>>>>>>>> However, my Kafka topic has 16 partitions and I already set
>>>>>>>> parallelismHint of TransactionalTridentKafkaSpout to 16.
What
>>>>>>>> am I doing wrong here? Please advise.
>>>>>>>>
>>>>>>>> Many thanks, Huy, Le Van
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> Email had 2 attachments:


>>>>  * storm01.png 165k (image/png)
>>>>  * storm02.png 476k (image/png)
>>>
>>
>


Mime
View raw message