spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <igor.ber...@gmail.com>
Subject Re: streaming app performance when would increasing execution size or adding more cores
Date Mon, 07 Mar 2016 22:11:29 GMT
may be you are experiencing problem with FileOutputCommiter vs
DirectCommiter while working with s3? do you have hdfs so you can try it to
verify?

committing in s3 will copy 1-by-1 all partitions to your final destination
bucket from _temporary, so this stage might become a bottleneck(so reducing
number of partitions might "solve" it)
there is a thread in this mailing list regarding this problem few weeks ago



On 7 March 2016 at 23:53, Andy Davidson <Andy@santacruzintegration.com>
wrote:

> We just deployed our first streaming apps. The next step is running them
> so they run reliably
>
> We have spend a lot of time reading the various prog guides looking at the
> standalone cluster manager app performance web pages.
>
> Looking at the streaming tab and the stages tab have been the most helpful
> in tuning our app. However we do not understand the connection between
> memory  and # cores will effect throughput and performance. Usually
> adding memory is the cheapest way to improve performance.
>
> When we have a single receiver call spark-submit --total-executor-cores
> 2. Changing the value does not seem to change throughput. our bottle neck
> was s3 write time, saveAsTextFile(). Reducing the number of partitions
> dramatically reduces s3 write times.
>
> Adding memory also does not improve performance
>
> *I would think that adding more cores would allow more concurrent tasks
> run. That is to say reducing num partions would slow things down*
>
> What are best practices?
>
> Kind regards
>
> Andy
>
>
>
>
>
>
>

Mime
View raw message