spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <>
Subject Re: streaming app performance when would increasing execution size or adding more cores
Date Mon, 07 Mar 2016 22:11:29 GMT
may be you are experiencing problem with FileOutputCommiter vs
DirectCommiter while working with s3? do you have hdfs so you can try it to

committing in s3 will copy 1-by-1 all partitions to your final destination
bucket from _temporary, so this stage might become a bottleneck(so reducing
number of partitions might "solve" it)
there is a thread in this mailing list regarding this problem few weeks ago

On 7 March 2016 at 23:53, Andy Davidson <>

> We just deployed our first streaming apps. The next step is running them
> so they run reliably
> We have spend a lot of time reading the various prog guides looking at the
> standalone cluster manager app performance web pages.
> Looking at the streaming tab and the stages tab have been the most helpful
> in tuning our app. However we do not understand the connection between
> memory  and # cores will effect throughput and performance. Usually
> adding memory is the cheapest way to improve performance.
> When we have a single receiver call spark-submit --total-executor-cores
> 2. Changing the value does not seem to change throughput. our bottle neck
> was s3 write time, saveAsTextFile(). Reducing the number of partitions
> dramatically reduces s3 write times.
> Adding memory also does not improve performance
> *I would think that adding more cores would allow more concurrent tasks
> run. That is to say reducing num partions would slow things down*
> What are best practices?
> Kind regards
> Andy

View raw message