storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Neilson <arsneil...@gmail.com>
Subject Re: TransactionalTridentKafkaSpout using only 1 executor
Date Fri, 05 Dec 2014 01:50:26 GMT
Over the long term the partitions would be used evenly, but unless you
change the partitioning scheme or message key then at any given time only
one partition will be receiving *new* messages.

If you want to test that your topology properly distributes the work at the
spout level, you could try loading from the beginning of your topic rather
than from the end.

To do that, set these values in your TridentKafkaConfig:

spoutConf.forceFromStart = true;
spoutConf.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); //
actually the default, so you don't necessarily need this line

On Thu, Dec 4, 2014 at 3:28 PM, Huy Le Van <huy.levan@insight-centre.org>
wrote:

>  I just dumped from text files directly to kafka producer using
> bin/kafka-console-producer.sh so I guess the keys were all null. I’ll write
> a producer to see. By the way, what is the command to show the distribution
> of my data in kafka?
>
>
> Best regards,
> Huy, Le Van
>
>  On Thursday, Dec 4, 2014 at 11:23 p.m., Harsha <storm@harsha.io>, wrote:
>
>> It doesnt' look like your kafka producer is distributing data across the
>> partitions. Whats your producer looks like . Are you sending any key with
>> each message or using null. If you are using null than what Andrew is
>> saying might be the problem. I would recommend using random UUID as a key
>> to send messages to your partition.
>>
>>
>> On Thu, Dec 4, 2014, at 03:12 PM, Huy Le Van wrote:
>>
>>
>>  Hi Harsha,
>>  I’ve attached 2 images below. You can see that I assigned 16 executors,
>> only one seemed to work. The other screenshot is the partition table.
>>
>>  Hi Andrew,
>>  That’s an interesting. I’m quite new to Kafka. May you take a look at
>> the second screenshot to see if the data was distributed evenly? Let’s say
>> it was written to one partition at a time (yes, this is the case where I
>> used only one producer), would it be rebalanced afterward?
>>
>>
>>
>>
>>
>>
>>   Best regards,
>>  Huy, Le Van
>>
>>   On Thursday, Dec 4, 2014 at 10:00 p.m., Andrew Neilson <
>> arsneilson@gmail.com>, wrote:
>>
>>  How is the kafka topic you are reading from partitioned? By default,
>> kafka will write to a single random partition at a time for 10 minutes
>> before switching to another. So if you are looking at live data, you would
>> only see data in one partition at a time unless you use a different
>> partitioning scheme.
>>
>>  See the Kafka FAQ for details on this
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
>> ?
>>
>>
>>  On Thu, Dec 4, 2014 at 1:51 PM, Harsha <storm@harsha.io> wrote:
>>
>>
>>
>>   can you post your storm UI executors page image. If there are 16
>> executors but only 1 seems to have fetching data. Can you please check on
>> your kafka producer if its distributing your data among all of your
>> partitions.
>>
>>
>>  On Thu, Dec 4, 2014, at 12:32 PM, Huy Le Van wrote:
>>
>>
>>  Could someone help me please?
>>
>>  Best regards,
>>  Huy, Le Van
>>
>>  On Thursday, Dec 4, 2014 at 3:35 p.m., Huy Le Van <
>> huy.levan@insight-centre.org>, wrote:
>>
>>
>>  Hi,
>>
>>  I’m trying to tune Kafka Trident (Transactional) and seeing that the
>> ‘spout0’ bolt uses only one executor. The problem is exactly as described
>> in
>> https://groups.google.com/forum/#!msg/storm-user/bI7976v9R5g/fulzpnPmzkEJ
>>  However, my Kafka topic has 16 partitions and I already set
>> parallelismHint of TransactionalTridentKafkaSpout to 16. What am I doing
>> wrong here? Please advise.
>>
>>  Many thanks,
>>  Huy, Le Van
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Email had 2 attachments:
>>
>>    - storm01.png
>>      165k (image/png)
>>    - storm02.png
>>      476k (image/png)
>>
>>
>>
>

Mime
View raw message