spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renxia Wang <renxia.w...@gmail.com>
Subject Re: Spark Streaming Kinesis Performance Decrease When Cluster Scale Up with More Executors
Date Sat, 16 Jul 2016 18:35:00 GMT
Hi Daniel,

I didn't re-sharding. I have much more shards than receivers.

I finally tune the cluster to work, by tuning locality, blockInterval,
reduce number of output files, disable speculation.

Especially the speculation, I have to turn it on for my 17 nodes cluster,
using the default setting. But with it turned on on 50 hosts cluster, it
makes the scheduler delay up to 20s. After I turned it off, scheduler delay
become 5-6s.

The cluster is working, however, I see some weird behavior in memory usage:
memory usage jump up regularly.

‚Äč
‚ÄčThis happens to all hosts.

Renxia



2016-07-14 12:59 GMT-07:00 Daniel Santana <daniel@everymundo.com>:

> Are you re-sharding your kinesis stream as well?
>
> I had a similar problem and increasing the number of kinesis stream shards
> solved it.
>
> --
> *Daniel Santana*
> Senior Software Engineer
>
> EVERY*MUNDO*
> 25 SE 2nd Ave., Suite 900
> Miami, FL 33131 USA
> main:+1 (305) 375-0045
> EveryMundo.com <http://www.everymundo.com/#whoweare>
>
> *Confidentiality Notice: *This email and any files transmitted with it
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this email in
> error, please notify the system manager.
>
> On Thu, Jul 14, 2016 at 2:20 PM, Renxia Wang <renxia.wang@gmail.com>
> wrote:
>
>> Additional information: The batch duration in my app is 1 minute, from
>> Spark UI, for each batch, the difference between Output Op Duration and Job
>> Duration is big. E.g. Output Op Duration is 1min while Job Duration is 19s.
>>
>> 2016-07-14 10:49 GMT-07:00 Renxia Wang <renxia.wang@gmail.com>:
>>
>>> Hi all,
>>>
>>> I am running a Spark Streaming application with Kinesis on EMR 4.7.1.
>>> The application runs on YARN and use client mode. There are 17 worker nodes
>>> (c3.8xlarge) with 100 executors and 100 receivers. This setting works fine.
>>>
>>> But when I increase the number of worker nodes to 50, and increase the
>>> number of executors to 250, with the 250 receivers, the processing time of
>>> batches increase from ~50s to 2.3min, and scheduler delay for tasks
>>> increase from ~0.2s max to 20s max (while 75th percentile is about 2-3s).
>>>
>>> I tried to only increase the number executors but keep the number of
>>> receivers, but then I still see performance degrade from ~50s to 1.1min,
>>> and for tasks the scheduler delay increased from ~0.2s max to 4s max (while
>>> 75th percentile is about 1s).
>>>
>>> The spark-submit is as follow. The only parameter I changed here is the
>>> num-executors.
>>>
>>> spark-submit
>>> --deploy-mode client
>>> --verbose
>>> --master yarn
>>> --jars /usr/lib/spark/extras/lib/spark-streaming-kinesis-asl.jar
>>> --driver-memory 20g --driver-cores 20
>>> --num-executors 250
>>> --executor-cores 5
>>> --executor-memory 8g
>>> --conf spark.yarn.executor.memoryOverhead=1600
>>> --conf spark.driver.maxResultSize=0
>>> --conf spark.dynamicAllocation.enabled=false
>>> --conf spark.rdd.compress=true
>>> --conf spark.streaming.stopGracefullyOnShutdown=true
>>> --conf spark.streaming.backpressure.enabled=true
>>> --conf spark.speculation=true
>>> --conf spark.task.maxFailures=15
>>> --conf spark.ui.retainedJobs=100
>>> --conf spark.ui.retainedStages=100
>>> --conf spark.executor.logs.rolling.maxRetainedFiles=1
>>> --conf spark.executor.logs.rolling.strategy=time
>>> --conf spark.executor.logs.rolling.time.interval=hourly
>>> --conf spark.scheduler.mode=FAIR
>>> --conf spark.scheduler.allocation.file=/home/hadoop/fairscheduler.xml
>>> --conf spark.metrics.conf=/home/hadoop/spark-metrics.properties
>>> --class Main /home/hadoop/Main-1.0.jar
>>>
>>> I found this issue seems relevant:
>>> https://issues.apache.org/jira/browse/SPARK-14327
>>>
>>> Any suggestion for me to troubleshoot this issue?
>>>
>>> Thanks,
>>>
>>> Renxia
>>>
>>>
>>
>

Mime
View raw message