kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhavesh Mistry <mistry.p.bhav...@gmail.com>
Subject Re: Spark Kafka Performance
Date Wed, 05 Nov 2014 03:45:21 GMT
Hi Eduardo,

Can you please take thread dump and see if there are blocking issues on
producer side ?  Do you have single instance of Producers and Multiple
treads ?

Are you using Scala Producer or New Java Producer ?  Also, what is your
producer property ?


Thanks,

Bhavesh

On Tue, Nov 4, 2014 at 12:40 AM, Eduardo Alfaia <e.costaalfaia@unibs.it>
wrote:

> Hi Gwen,
> I have changed the java code kafkawordcount to use reducebykeyandwindow in
> spark.
>
> ----- Messaggio originale -----
> Da: "Gwen Shapira" <gshapira@cloudera.com>
> Inviato: ‎03/‎11/‎2014 21:08
> A: "users@kafka.apache.org" <users@kafka.apache.org>
> Cc: "user@spark.incubator.apache.org" <user@spark.incubator.apache.org>
> Oggetto: Re: Spark Kafka Performance
>
> Not sure about the throughput, but:
>
> "I mean that the words counted in spark should grow up" - The spark
> word-count example doesn't accumulate.
> It gets an RDD every n seconds and counts the words in that RDD. So we
> don't expect the count to go up.
>
>
>
> On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia <
> e.costaalfaia@unibs.it
> > wrote:
>
> > Hi Guys,
> > Anyone could explain me how to work Kafka with Spark, I am using the
> > JavaKafkaWordCount.java like a test and the line command is:
> >
> > ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
> > spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3
> >
> > and like a producer I am using this command:
> >
> > rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
> > -l 100 -n 10
> >
> >
> > rdkafka_cachesender is a program that was developed by me which send to
> > kafka the output.txt’s content where -l is the length of each send(upper
> > bound) and -n is the lines to send in a row. Bellow is the throughput
> > calculated by the program:
> >
> > File is 2235755 bytes
> > throughput (b/s) = 699751388
> > throughput (b/s) = 723542382
> > throughput (b/s) = 662989745
> > throughput (b/s) = 505028200
> > throughput (b/s) = 471263416
> > throughput (b/s) = 446837266
> > throughput (b/s) = 409856716
> > throughput (b/s) = 373994467
> > throughput (b/s) = 366343097
> > throughput (b/s) = 373240017
> > throughput (b/s) = 386139016
> > throughput (b/s) = 373802209
> > throughput (b/s) = 369308515
> > throughput (b/s) = 366935820
> > throughput (b/s) = 365175388
> > throughput (b/s) = 362175419
> > throughput (b/s) = 358356633
> > throughput (b/s) = 357219124
> > throughput (b/s) = 352174125
> > throughput (b/s) = 348313093
> > throughput (b/s) = 355099099
> > throughput (b/s) = 348069777
> > throughput (b/s) = 348478302
> > throughput (b/s) = 340404276
> > throughput (b/s) = 339876031
> > throughput (b/s) = 339175102
> > throughput (b/s) = 327555252
> > throughput (b/s) = 324272374
> > throughput (b/s) = 322479222
> > throughput (b/s) = 319544906
> > throughput (b/s) = 317201853
> > throughput (b/s) = 317351399
> > throughput (b/s) = 315027978
> > throughput (b/s) = 313831014
> > throughput (b/s) = 310050384
> > throughput (b/s) = 307654601
> > throughput (b/s) = 305707061
> > throughput (b/s) = 307961102
> > throughput (b/s) = 296898200
> > throughput (b/s) = 296409904
> > throughput (b/s) = 294609332
> > throughput (b/s) = 293397843
> > throughput (b/s) = 293194876
> > throughput (b/s) = 291724886
> > throughput (b/s) = 290031314
> > throughput (b/s) = 289747022
> > throughput (b/s) = 289299632
> >
> > The throughput goes down after some seconds and it does not maintain the
> > performance like the initial values:
> >
> > throughput (b/s) = 699751388
> > throughput (b/s) = 723542382
> > throughput (b/s) = 662989745
> >
> > Another question is about spark, after I have started the spark line
> > command after 15 sec spark continue to repeat the words counted, but my
> > program continue to send words to kafka, so I mean that the words counted
> > in spark should grow up. I have attached the log from spark.
> >
> > My Case is:
> >
> > ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) ->
> > ComputerC (Spark)
> >
> > If I don’t explain very well send a reply to me.
> >
> > Thanks Guys
> > --
> > Informativa sulla Privacy: http://www.unibs.it/node/8155
> >
>
> --
> Informativa sulla Privacy: http://www.unibs.it/node/8155
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message