spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Spark streaming join on yarn
Date Thu, 29 Nov 2018 01:30:34 GMT
How many tasks in the stage 2? How long do they take? If there are 200
tasks taking 1 second each (so many "rounds" of tasks on available cores
taking 13 seconds), then you can reduce the number tasks by setting the sql
conf spark.shuffle.partitions (defaults to 200).  Given the number of cores
in your cluster, you probably want to do 1-3 rounds of tasks, not more.

On Wed, Nov 28, 2018 at 2:28 PM Abhijeet Kumar <abhijeet.kumar@sentienz.com>
wrote:

> Hello Team,
>
> I’m doing a simple join. I’m running Spark on Yarn and performing a simple
> two streaming join.
>
> DAG of my job
>
>
>
> So, it’s taking around 13 secs to complete stage 2.
>
> My command to run jar:
>
> spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0
> --class com.streaming.spark.RealtimeTuning --master yarn --deploy-mode
> cluster --executor-memory 4G --driver-memory 2G --num-executors 1
> ./target/scala-2.11/RealtimeTuning-assembly-0.1.jar
>
> Note: I’m running everything locally(Single node cluster)
>
> Any help would be appreciated.
>
> Thank you,
>
> Abhijeet Kumar
> Software Development Engineer
> Sentienz Solutions Private Limited
>

Mime
View raw message