spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JF Chen <darou...@gmail.com>
Subject Re: How to increase the parallelism of Spark Streaming application´╝č
Date Thu, 08 Nov 2018 00:14:29 GMT
Memory is not a big problem for me... SO  no any other bad effect?

Regard,
Junfeng Chen


On Wed, Nov 7, 2018 at 4:51 PM Michael Shtelma <mshtelma@gmail.com> wrote:

> If you configure to many Kafka partitions, you can run into memory issues.
> This will increase memory requirements for spark job a lot.
>
> Best,
> Michael
>
>
> On Wed, Nov 7, 2018 at 8:28 AM JF Chen <darouwan@gmail.com> wrote:
>
>> I have a Spark Streaming application which reads data from kafka and save
>> the the transformation result to hdfs.
>> My original partition number of kafka topic is 8, and repartition the
>> data to 100 to increase the parallelism of spark job.
>> Now I am wondering if I increase the kafka partition number to 100
>> instead of setting repartition to 100, will the performance be enhanced? (I
>> know repartition action cost a lot cpu resource)
>> If I set the kafka partition number to 100, does it have any negative
>> efficiency?
>> I just have one production environment so it's not convenient for me to
>> do the test....
>>
>> Thanks!
>>
>> Regard,
>> Junfeng Chen
>>
>

Mime
View raw message