spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sun Rui <sunrise_...@163.com>
Subject Re: RDD Location
Date Sat, 31 Dec 2016 03:41:59 GMT
You can’t call runJob inside getPreferredLocations().
You can take a look at the source  code of HadoopRDD to help you implement getPreferredLocations()
appropriately.
> On Dec 31, 2016, at 09:48, Fei Hu <hufei68@gmail.com> wrote:
> 
> That is a good idea.
> 
> I tried add the following code to get getPreferredLocations() function:
> 
> val results: Array[Array[DataChunkPartition]] = context.runJob(
>       partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) =>
partIter.toArray, dd, allowLocal = true)
> 
> But it seems to be suspended when executing this function. But if I move the code to
other places, like the main() function, it runs well.
> 
> What is the reason for it?
> 
> Thanks,
> Fei
> 
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <sunrise_win@163.com <mailto:sunrise_win@163.com>>
wrote:
> Maybe you can create your own subclass of RDD and override the getPreferredLocations()
to implement the logic of dynamic changing of the locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hufei68@gmail.com <mailto:hufei68@gmail.com>>
wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to initialize
the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
> 
> 
> 


Mime
View raw message