spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vincent gromakowski <vincent.gromakow...@gmail.com>
Subject Re: How to increase the parallelism of Spark Streaming application?
Date Wed, 07 Nov 2018 08:55:13 GMT
On the other side increasing parallelism with kakfa partition avoid the
shuffle in spark to repartition

Le mer. 7 nov. 2018 à 09:51, Michael Shtelma <mshtelma@gmail.com> a écrit :

> If you configure to many Kafka partitions, you can run into memory issues.
> This will increase memory requirements for spark job a lot.
>
> Best,
> Michael
>
>
> On Wed, Nov 7, 2018 at 8:28 AM JF Chen <darouwan@gmail.com> wrote:
>
>> I have a Spark Streaming application which reads data from kafka and save
>> the the transformation result to hdfs.
>> My original partition number of kafka topic is 8, and repartition the
>> data to 100 to increase the parallelism of spark job.
>> Now I am wondering if I increase the kafka partition number to 100
>> instead of setting repartition to 100, will the performance be enhanced? (I
>> know repartition action cost a lot cpu resource)
>> If I set the kafka partition number to 100, does it have any negative
>> efficiency?
>> I just have one production environment so it's not convenient for me to
>> do the test....
>>
>> Thanks!
>>
>> Regard,
>> Junfeng Chen
>>
>

Mime
View raw message