spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Partitioning in spark streaming
Date Wed, 12 Aug 2015 00:35:11 GMT
I am also trying to understand how are files named when writing to hadoop?
for eg: how does "saveAs" method ensures that each executor is generating
unique files?

On Tue, Aug 11, 2015 at 4:21 PM, ayan guha <guha.ayan@gmail.com> wrote:

> partitioning - by itself - is a property of RDD. so essentially it is no
> different in case of streaming where each batch is one RDD. You can use
> partitionBy on RDD and pass on your custom partitioner function to it.
>
> One thing you should consider is how balanced are your partitions ie your
> partition scheme should not skew data into one partition too much.
>
> Best
> Ayan
>
> On Wed, Aug 12, 2015 at 9:06 AM, Mohit Anchlia <mohitanchlia@gmail.com>
> wrote:
>
>> How does partitioning in spark work when it comes to streaming? What's
>> the best way to partition a time series data grouped by a certain tag like
>> categories of product video, music etc.
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message