storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vineet Mishra <clearmido...@gmail.com>
Subject Re: Storm Kafka Processing
Date Tue, 03 Feb 2015 08:05:09 GMT
Hi Harsha,

Based on the proposed metric, I ensured the specified changes by changing
the Kafka-Storm Version bundle.

Although I could see the difference from the last bundle used to the
current change but was not satisfied by the way Spouts were processing. The
observation which I had was,

The Spout were running with Executor counts as 10, while initiating the job
around half of the executors(5) started processing in parallel to ingest
the data.

As soon as the counts reached around a million or so the state of
parallelism dropped and eventually it started processing in serially(One
Executor at a time).

Executors (All time)
Id Uptime Host Port Emitted Transferred Complete latency (ms) Acked Failed
[2-2] 13m 54s host3 6703 0 0 0.000 0 0
[3-3] 13m 52s host2 6702 318300 318300 4.789 318160 0
[4-4] 13m 52s host3 6702 434200 434200 7.064 434380 0
[5-5] 13m 53s host2 6701 20 20 0.000 0 0
[6-6] 13m 55s host3 6701 0 0 0.000 0 0
[7-7] 13m 51s host2 6700 25000 25000 4.122 24500 0
[8-8] 13m 51s host3 6700 248360 248360 9.514 245780 0
[9-9] 13m 52s host2 6703 0 0 0.000 0 0
[10-10] 13m 54s host3 6703 235220 235220 9.250 233200 0
[11-11] 13m 52s host2 6702 204420 204420 10.382 205800 0

I am having around .2 Billion Events ingested to Kafka which needs to be
processed through Storm in Real time but I am not sure what is making this
unexpected intermittent behavior of the storm and how can I prevent this in
near future.

Expecting Expert Suggestions.

Thanks!



On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <clearmidoubt@gmail.com>
wrote:

> Well I am already running Kafka with 10 Partitions and Replication factor
> as 3 which is the default size of my cluster.
>
> bin/kafka-topics.sh --create --zookeeper host1:2181,host2:2181,host3:2181
> --replication-factor 3 --partitions 10 --topic test
>
> and I am also running Kafka Storm topology with Executors count as 10
>
> TopologyBuilder builder=new TopologyBuilder();
>         builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10);
>
> I am having a notion that since the time I have started running Kafka from
> last* changed RF and # of Partitions I am landing up with latency.
>
> * bin/kafka-topics.sh --create --zookeeper
> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1
> --topic test
>
> Well I will try with above provided Storm Kafka bundle. Hope that could
> help out!
>
> Thanks!
>
> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <storm@harsha.io> wrote:
>
>>  Vineet,
>>        Can you try using the one in storm
>> https://github.com/apache/storm/tree/master/external/storm-kafka . This
>> is published into maven repo. So you can use the following
>> <dependency>
>> <groupId>org.apache.storm</groupId>
>> <artifactId>storm-kafka</artifactId>
>> <version>0.9.3</version>
>> </dependency>
>>
>> If you are using topic with partitions size 10 make sure you configured
>> your kafka spout with parallelism set to 10. Also make sure on the producer
>> side you are pushing data onto all of the 10 partitions so that your kafka
>> spout is fetching data from all of the 10 partitions.
>> -Harsha
>>
>>
>> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote:
>>
>> Hi Harsha,
>>
>> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from
>>
>> https://github.com/wurstmeister/storm-kafka-0.8-plus
>>
>> Thanks!
>>
>> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <storm@harsha.io> wrote:
>>
>>
>> Vineet,
>>         Which kafka spout are you using?
>>
>> -Harsha
>>
>>
>>
>> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote:
>>
>> Hi,
>>
>> I am running Kafka Storm Engine to process real time data generated on a
>> 3 node distributed cluster.
>>
>> Currently I have set 10 Executors for Storm Spout, which I don't think is
>> running in parallel.
>> Moreover earlier I was running the Kafka Topology with Replication Factor
>> and Partitions as 1(which seems to have run comparatively faster), now I
>> gave the Replication Factor as 3 and Partitions as 10 and I could see the
>> performance degradation.
>>
>> Is there any way I can max utilize the available resource and get the max
>> throughput of event processing.
>>
>> Looking for the expert suggestions at URGENT.
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>>
>
>

Mime
View raw message