spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Ott <alex...@gmail.com>
Subject Re: Spark structured streaming - performance tuning
Date Sat, 18 Apr 2020 07:39:48 GMT
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <alexott@gmail.com> wrote:

 SV>     http://shop.oreilly.com/product/0636920047568.do has quite good information
 SV>     on it.  For Kafka, you need to start with approximation that processing of
 SV>     each partition is a separate task that need to be executed, so you need to
 SV>     plan number of cores correspondingly.
 SV>    
 SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>      SV> Hello, 
 SV>      SV> Can someone point me to a good video or document which takes about performance
tuning for structured streaming app? 
 SV>      SV> I am looking especially for listening to Kafka topics say 5 topics each
with 100 portions .
 SV>      SV> Trying to figure out best cluster size and number of executors and cores
required. 


-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message