Over the long term the partitions would be used evenly, but unless you change the partitioning scheme or message key then at any given time only one partition will be receiving *new* messages.

If you want to test that your topology properly distributes the work at the spout level, you could try loading from the beginning of your topic rather than from the end.

To do that, set these values in your TridentKafkaConfig:

spoutConf.forceFromStart = true;
spoutConf.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); // actually the default, so you don't necessarily need this line

On Thu, Dec 4, 2014 at 3:28 PM, Huy Le Van <huy.levan@insight-centre.org> wrote:
I just dumped from text files directly to kafka producer using bin/kafka-console-producer.sh so I guess the keys were all null. I’ll write a producer to see. By the way, what is the command to show the distribution of my data in kafka?

Best regards,
Huy, Le Van

On Thursday, Dec 4, 2014 at 11:23 p.m., Harsha <storm@harsha.io>, wrote:
It doesnt' look like your kafka producer is distributing data across the partitions. Whats your producer looks like . Are you sending any key with each message or using null. If you are using null than what Andrew is saying might be the problem. I would recommend using random UUID as a key to send messages to your partition.
On Thu, Dec 4, 2014, at 03:12 PM, Huy Le Van wrote:
Hi Harsha,
I’ve attached 2 images below. You can see that I assigned 16 executors, only one seemed to work. The other screenshot is the partition table.
Hi Andrew,
That’s an interesting. I’m quite new to Kafka. May you take a look at the second screenshot to see if the data was distributed evenly? Let’s say it was written to one partition at a time (yes, this is the case where I used only one producer), would it be rebalanced afterward?


Best regards,
Huy, Le Van
On Thursday, Dec 4, 2014 at 10:00 p.m., Andrew Neilson <arsneilson@gmail.com>, wrote:
How is the kafka topic you are reading from partitioned? By default, kafka will write to a single random partition at a time for 10 minutes before switching to another. So if you are looking at live data, you would only see data in one partition at a time unless you use a different partitioning scheme.
On Thu, Dec 4, 2014 at 1:51 PM, Harsha <storm@harsha.io> wrote:

can you post your storm UI executors page image. If there are 16 executors but only 1 seems to have fetching data. Can you please check on your kafka producer if its distributing your data among all of your partitions.
On Thu, Dec 4, 2014, at 12:32 PM, Huy Le Van wrote:
Could someone help me please?
Best regards,
Huy, Le Van
On Thursday, Dec 4, 2014 at 3:35 p.m., Huy Le Van <huy.levan@insight-centre.org>, wrote:
I’m trying to tune Kafka Trident (Transactional) and seeing that the ‘spout0’ bolt uses only one executor. The problem is exactly as described in https://groups.google.com/forum/#!msg/storm-user/bI7976v9R5g/fulzpnPmzkEJ
However, my Kafka topic has 16 partitions and I already set parallelismHint of TransactionalTridentKafkaSpout to 16. What am I doing wrong here? Please advise.
Many thanks,
Huy, Le Van

Email had 2 attachments:

  • storm01.png
      165k (image/png)
  • storm02.png
      476k (image/png)