spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Partitioning strategy
Date Sun, 02 Apr 2017 11:18:24 GMT
You can always repartition, but maybe for your use case different rdds with the same data,
but different partition strategies could make sense. It may also make sense to choose an appropriate
format on disc (orc, parquet). You have to choose based also on the users' non-functional

> On 2. Apr 2017, at 12:32, <> <>
> Hi,
> I have RDD with 4 years’ data with suppose 20 partitions. On runtime, user can decide
to select few months or years of RDD. That means, based upon user time selection RDD is being
filtered and on filtered RDD further transformations and actions are performed. And, as spark
says, child RDD get partitions from parent RDD.
> Therefore, is there any way to decide partitioning strategy after filter operations?
> Regards,
> Jasbir Singh
> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy. 
> ______________________________________________________________________________________

View raw message