spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diwakar Dhanuskodi <>
Subject Re: Kafka directsream receiving rate
Date Fri, 05 Feb 2016 19:41:01 GMT
I am  able  to  see  no of  messages processed  per  event  in  sparkstreaming web
UI . Also  I am  counting  the  messages inside  foreachRDD .
Removed  the  settings for  backpressure but still  the  same .

Sent from Samsung Mobile.

<div>-------- Original message --------</div><div>From: Cody Koeninger <>
</div><div>Date:06/02/2016  00:33  (GMT+05:30) </div><div>To: Diwakar
Dhanuskodi <> </div><div>Cc:
</div><div>Subject: Re: Kafka directsream receiving rate </div><div>
</div>How are you counting the number of messages?

I'd go ahead and remove the settings for backpressure and maxrateperpartition, just to eliminate
that as a variable.

On Fri, Feb 5, 2016 at 12:22 PM, Diwakar Dhanuskodi <> wrote:
I am  using  one  directsream. Below  is  the  call  to directsream:-

val topicSet = topics.split(",").toSet
val kafkaParams = Map[String,String]("bootstrap.servers" -> "")
val k = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, kafkaParams,

When  I replace   DirectStream call  to  createStream,  all  messages were  read  by  one
 Dstream block.:-
val k = KafkaUtils.createStream(ssc, "","resp",topicMap ,StorageLevel.MEMORY_ONLY)

I am  using   below  spark-submit to execute:
./spark-submit --master yarn-client --conf "spark.dynamicAllocation.enabled=true" --conf "spark.shuffle.service.enabled=true"
--conf "spark.sql.tungsten.enabled=false" --conf "spark.sql.codegen=false" --conf "spark.sql.unsafe.enabled=false"
--conf "spark.streaming.backpressure.enabled=true" --conf "spark.locality.wait=1s" --conf
"spark.shuffle.consolidateFiles=true"   --conf "spark.streaming.kafka.maxRatePerPartition=1000000"
--driver-memory 2g --executor-memory 1g --class com.tcs.dime.spark.SparkReceiver   --files
--jars /root/dime/jars/spark-streaming-kafka-assembly_2.10-1.5.1.jar,/root/Jars/sparkreceiver.jar

Sent from Samsung Mobile.

-------- Original message --------
From: Cody Koeninger <>
Date:05/02/2016 22:07 (GMT+05:30)
To: Diwakar Dhanuskodi <>
Subject: Re: Kafka directsream receiving rate

If you're using the direct stream, you have 0 receivers.  Do you mean you have 1 executor?

Can you post the relevant call to createDirectStream from your code, as well as any relevant
spark configuration?

On Thu, Feb 4, 2016 at 8:13 PM, Diwakar Dhanuskodi <> wrote:
Adding more info

Batch  interval  is  2000ms.
I expect all 100 messages  go thru one  dstream from  directsream but it receives at rate
of 10 messages at time. Am  I missing  some  configurations here. Any help appreciated. 


Sent from Samsung Mobile.

-------- Original message --------
From: Diwakar Dhanuskodi <>
Date:05/02/2016 07:33 (GMT+05:30)
Subject: Kafka directsream receiving rate

Using spark 1.5.1.
I have a topic with 20 partitions.  When I publish 100 messages. Spark direct stream is receiving
10 messages per  dstream. I have  only  one  receiver . When I used createStream the  receiver
 received  entire 100 messages  at once.  

Appreciate  any  help .


Sent from Samsung Mobile.

View raw message