spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renxia Wang <>
Subject Spark Streaming Kinesis Performance Decrease When Cluster Scale Up with More Executors
Date Thu, 14 Jul 2016 17:49:12 GMT
Hi all,

I am running a Spark Streaming application with Kinesis on EMR 4.7.1. The
application runs on YARN and use client mode. There are 17 worker nodes
(c3.8xlarge) with 100 executors and 100 receivers. This setting works fine.

But when I increase the number of worker nodes to 50, and increase the
number of executors to 250, with the 250 receivers, the processing time of
batches increase from ~50s to 2.3min, and scheduler delay for tasks
increase from ~0.2s max to 20s max (while 75th percentile is about 2-3s).

I tried to only increase the number executors but keep the number of
receivers, but then I still see performance degrade from ~50s to 1.1min,
and for tasks the scheduler delay increased from ~0.2s max to 4s max (while
75th percentile is about 1s).

The spark-submit is as follow. The only parameter I changed here is the

--deploy-mode client
--master yarn
--jars /usr/lib/spark/extras/lib/spark-streaming-kinesis-asl.jar
--driver-memory 20g --driver-cores 20
--num-executors 250
--executor-cores 5
--executor-memory 8g
--conf spark.yarn.executor.memoryOverhead=1600
--conf spark.driver.maxResultSize=0
--conf spark.dynamicAllocation.enabled=false
--conf spark.rdd.compress=true
--conf spark.streaming.stopGracefullyOnShutdown=true
--conf spark.streaming.backpressure.enabled=true
--conf spark.speculation=true
--conf spark.task.maxFailures=15
--conf spark.ui.retainedJobs=100
--conf spark.ui.retainedStages=100
--conf spark.executor.logs.rolling.maxRetainedFiles=1
--conf spark.executor.logs.rolling.strategy=time
--conf spark.executor.logs.rolling.time.interval=hourly
--conf spark.scheduler.mode=FAIR
--conf spark.scheduler.allocation.file=/home/hadoop/fairscheduler.xml
--conf spark.metrics.conf=/home/hadoop/
--class Main /home/hadoop/Main-1.0.jar

I found this issue seems relevant:

Any suggestion for me to troubleshoot this issue?



View raw message