spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Kini <ameetk...@gmail.com>
Subject Re: why is NextIterator private?
Date Fri, 21 Feb 2014 19:54:11 GMT
The use case is to control the partitions as they come out of the
HadoopRDD.
1. Have my own HadoopPartition that has fields specific to my application.
These fields would then be used by other RDD operations (also overridden by
me). This is why I was looking to extend HadoopPartition.
2. Have my own getPartitions which has slightly different partitioning
logic. This can almost be solved by subclassing InputFormat and its
getSplits method, but I still need to have getPartitions create
MyHadoopPartition instead of HadoopPartition.

Ameet


On Fri, Feb 21, 2014 at 2:37 PM, Jey Kottalam <jey@cs.berkeley.edu> wrote:

> What's the motivation for subclassing HadoopRDD? I don't believe
> that's a supported use case. Is it not possible to do what you need
> with a Hadoop InputFormat?
>
> On Fri, Feb 21, 2014 at 11:16 AM, Ameet Kini <ameetkini@gmail.com> wrote:
> > I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
> > in compute().
> >
> > Thanks,
> > Ameet
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message