kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Alfaia <e.costaalf...@unibs.it>
Subject R: Spark Kafka Performance
Date Tue, 04 Nov 2014 08:40:49 GMT
Hi Gwen,
I have changed the java code kafkawordcount to use reducebykeyandwindow in spark.

----- Messaggio originale -----
Da: "Gwen Shapira" <gshapira@cloudera.com>
Inviato: ‎03/‎11/‎2014 21:08
A: "users@kafka.apache.org" <users@kafka.apache.org>
Cc: "user@spark.incubator.apache.org" <user@spark.incubator.apache.org>
Oggetto: Re: Spark Kafka Performance

Not sure about the throughput, but:

"I mean that the words counted in spark should grow up" - The spark
word-count example doesn't accumulate.
It gets an RDD every n seconds and counts the words in that RDD. So we
don't expect the count to go up.



On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia <e.costaalfaia@unibs.it
> wrote:

> Hi Guys,
> Anyone could explain me how to work Kafka with Spark, I am using the
> JavaKafkaWordCount.java like a test and the line command is:
>
> ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
> spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3
>
> and like a producer I am using this command:
>
> rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
> -l 100 -n 10
>
>
> rdkafka_cachesender is a program that was developed by me which send to
> kafka the output.txt’s content where -l is the length of each send(upper
> bound) and -n is the lines to send in a row. Bellow is the throughput
> calculated by the program:
>
> File is 2235755 bytes
> throughput (b/s) = 699751388
> throughput (b/s) = 723542382
> throughput (b/s) = 662989745
> throughput (b/s) = 505028200
> throughput (b/s) = 471263416
> throughput (b/s) = 446837266
> throughput (b/s) = 409856716
> throughput (b/s) = 373994467
> throughput (b/s) = 366343097
> throughput (b/s) = 373240017
> throughput (b/s) = 386139016
> throughput (b/s) = 373802209
> throughput (b/s) = 369308515
> throughput (b/s) = 366935820
> throughput (b/s) = 365175388
> throughput (b/s) = 362175419
> throughput (b/s) = 358356633
> throughput (b/s) = 357219124
> throughput (b/s) = 352174125
> throughput (b/s) = 348313093
> throughput (b/s) = 355099099
> throughput (b/s) = 348069777
> throughput (b/s) = 348478302
> throughput (b/s) = 340404276
> throughput (b/s) = 339876031
> throughput (b/s) = 339175102
> throughput (b/s) = 327555252
> throughput (b/s) = 324272374
> throughput (b/s) = 322479222
> throughput (b/s) = 319544906
> throughput (b/s) = 317201853
> throughput (b/s) = 317351399
> throughput (b/s) = 315027978
> throughput (b/s) = 313831014
> throughput (b/s) = 310050384
> throughput (b/s) = 307654601
> throughput (b/s) = 305707061
> throughput (b/s) = 307961102
> throughput (b/s) = 296898200
> throughput (b/s) = 296409904
> throughput (b/s) = 294609332
> throughput (b/s) = 293397843
> throughput (b/s) = 293194876
> throughput (b/s) = 291724886
> throughput (b/s) = 290031314
> throughput (b/s) = 289747022
> throughput (b/s) = 289299632
>
> The throughput goes down after some seconds and it does not maintain the
> performance like the initial values:
>
> throughput (b/s) = 699751388
> throughput (b/s) = 723542382
> throughput (b/s) = 662989745
>
> Another question is about spark, after I have started the spark line
> command after 15 sec spark continue to repeat the words counted, but my
> program continue to send words to kafka, so I mean that the words counted
> in spark should grow up. I have attached the log from spark.
>
> My Case is:
>
> ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) ->
> ComputerC (Spark)
>
> If I don’t explain very well send a reply to me.
>
> Thanks Guys
> --
> Informativa sulla Privacy: http://www.unibs.it/node/8155
>

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message