spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr>
Subject Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?
Date Tue, 05 Jun 2018 11:48:08 GMT

On 05/06/2018 13:44, Saisai Shao wrote:
> You need to read the code, this is an undocumented configuration.
I'm on it right now, but, Spark is a big piece of software.
> Basically this will break the ordering of Streaming jobs, AFAIK it may 
> get unexpected results if you streaming jobs are not independent.
What do you mean exactly by not independent ?
Are several source joined together dependent ?

Thanks,
Thomas
>
> thomas lavocat <thomas.lavocat@univ-grenoble-alpes.fr 
> <mailto:thomas.lavocat@univ-grenoble-alpes.fr>> 于2018年6月5日周二 
> 下午7:17写道:
>
>     Hello,
>
>     Thank's for your answer.
>
>
>     On 05/06/2018 11:24, Saisai Shao wrote:
>>     spark.streaming.concurrentJobs is a driver side internal
>>     configuration, this means that how many streaming jobs can be
>>     submitted concurrently in one batch. Usually this should not be
>>     configured by user, unless you're familiar with Spark Streaming
>>     internals, and know the implication of this configuration.
>
>     How can I find some documentation about those implications ?
>
>     I've experimented some configuration of this parameters and found
>     out that my overall throughput is increased in correlation with
>     this property.
>     But I'm experiencing scalability issues. With more than 16
>     receivers spread over 8 executors, my executors no longer receive
>     work from the driver and fall idle.
>     Is there an explanation ?
>
>     Thanks,
>     Thomas
>


Mime
View raw message