storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vineet Mishra <clearmido...@gmail.com>
Subject Re: Storm Kafka Processing
Date Thu, 05 Feb 2015 19:19:26 GMT
Its going fine now.

I used the proposed Kakfa-Storm library and it is working great.

Would like to quickly add into the same.

Is there another way so as to maximize the parallelism of storm spout
reading from kafka?
Or increasing the partitioners the only option for increasing throughput?

Thanks!
On Feb 3, 2015 8:21 PM, "Harsha" <storm@harsha.io> wrote:

>  Vineet,
>          In kafka producer.send(KeyedMessage<Id, Message>) are you passing
> in a ID. If this is constant or null your data won't be distributed to all
> partitions. In case of constant Id all of your messages goes to same
> partition and incase of null it chooses round-robin to distribute among
> partitions. Its better to use a random UUID to distribute among all of your
> partitions.
> -Harsha
>
>
> On Tue, Feb 3, 2015, at 12:44 AM, Vineet Mishra wrote:
>
> Do you mean to say that the event published to Kafka is not partition
> distributed?
>
> Well while creating the topic I ensured to create # of partitions as 10
> and replication factor as 3.
>
> Is it something effected as how I am writing to Kafka?
>
> Thanks!
>
> On Tue, Feb 3, 2015 at 1:50 PM, Andrew Neilson <arsneilson@gmail.com>
> wrote:
>
> The behaviour you are describing sounds like your topology is processing a
> small backlog of events built up in each partition and then catching up to
> realtime where events are only being published to one of the 10 partitions
> at a time. I will echo Harsha in suggesting that you verify you are
> actually publishing to all partitions (important: this is *not* the
> default behaviour).
>
> On Tue, Feb 3, 2015 at 12:05 AM, Vineet Mishra <clearmidoubt@gmail.com>
> wrote:
>
> Hi Harsha,
>
> Based on the proposed metric, I ensured the specified changes by changing
> the Kafka-Storm Version bundle.
>
> Although I could see the difference from the last bundle used to the
> current change but was not satisfied by the way Spouts were processing. The
> observation which I had was,
>
> The Spout were running with Executor counts as 10, while initiating the
> job around half of the executors(5) started processing in parallel to
> ingest the data.
>
> As soon as the counts reached around a million or so the state of
> parallelism dropped and eventually it started processing in serially(One
> Executor at a time).
>
> Executors (All time)
> IdUptimeHostPortEmittedTransferredComplete latency (ms)AckedFailed
> [2-2]13m 54shost36703000.00000
> [3-3]13m 52shost267023183003183004.7893181600
> [4-4]13m 52shost367024342004342007.0644343800
> [5-5]13m 53shost2670120200.00000
> [6-6]13m 55shost36701000.00000
> [7-7]13m 51shost2670025000250004.122245000
> [8-8]13m 51shost367002483602483609.5142457800
> [9-9]13m 52shost26703000.00000
> [10-10]13m 54shost367032352202352209.2502332000
> [11-11]13m 52shost2670220442020442010.3822058000
>
> I am having around .2 Billion Events ingested to Kafka which needs to be
> processed through Storm in Real time but I am not sure what is making this
> unexpected intermittent behavior of the storm and how can I prevent this in
> near future.
>
> Expecting Expert Suggestions.
>
> Thanks!
>
>
>
> On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <clearmidoubt@gmail.com>
> wrote:
>
> Well I am already running Kafka with 10 Partitions and Replication factor
> as 3 which is the default size of my cluster.
>
> bin/kafka-topics.sh --create --zookeeper host1:2181,host2:2181,host3:2181
> --replication-factor 3 --partitions 10 --topic test
>
> and I am also running Kafka Storm topology with Executors count as 10
>
> TopologyBuilder builder=new TopologyBuilder();
>         builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10);
>
> I am having a notion that since the time I have started running Kafka from
> last* changed RF and # of Partitions I am landing up with latency.
>
> * bin/kafka-topics.sh --create --zookeeper
> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1
> --topic test
>
> Well I will try with above provided Storm Kafka bundle. Hope that could
> help out!
>
> Thanks!
>
> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <storm@harsha.io> wrote:
>
>
> Vineet,
>        Can you try using the one in storm
> https://github.com/apache/storm/tree/master/external/storm-kafka . This
> is published into maven repo. So you can use the following
> <dependency>
> <groupId>org.apache.storm</groupId>
> <artifactId>storm-kafka</artifactId>
> <version>0.9.3</version>
> </dependency>
>
> If you are using topic with partitions size 10 make sure you configured
> your kafka spout with parallelism set to 10. Also make sure on the producer
> side you are pushing data onto all of the 10 partitions so that your kafka
> spout is fetching data from all of the 10 partitions.
>
> -Harsha
>
>
>
> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote:
>
> Hi Harsha,
>
> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from
>
> https://github.com/wurstmeister/storm-kafka-0.8-plus
>
> Thanks!
>
> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <storm@harsha.io> wrote:
>
>
> Vineet,
>         Which kafka spout are you using?
>
> -Harsha
>
>
>
> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote:
>
> Hi,
>
> I am running Kafka Storm Engine to process real time data generated on a 3
> node distributed cluster.
>
> Currently I have set 10 Executors for Storm Spout, which I don't think is
> running in parallel.
> Moreover earlier I was running the Kafka Topology with Replication Factor
> and Partitions as 1(which seems to have run comparatively faster), now I
> gave the Replication Factor as 3 and Partitions as 10 and I could see the
> performance degradation.
>
> Is there any way I can max utilize the available resource and get the max
> throughput of event processing.
>
> Looking for the expert suggestions at URGENT.
>
> Thanks!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message