spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kronenfeld <nkronenf...@oculusinfo.com>
Subject Re: Problem with tests
Date Wed, 27 Nov 2013 21:15:16 GMT
Thanks.  Now that that's checked into HEAD, it all seems to work again.


On Sun, Nov 24, 2013 at 5:02 AM, Reynold Xin <rxin@apache.org> wrote:

> Take a look at this pull request and see if it fixes your problem:
> https://github.com/apache/incubator-spark/pull/201
>
> I changed the semantics of the index from the output partition index back
> to the rdd partition index.
>
>
>
> On Sat, Nov 23, 2013 at 10:01 PM, Nathan Kronenfeld <
> nkronenfeld@oculusinfo.com> wrote:
>
> > Though I think it's a more general problem...
> >
> > Take the following:
> >
> > val data = sc.parallelize(Range(0, 8), 2)
> > val data2 = data.mapPartitionsWithIndex((index, i) => i.map(x => (x,
> > index)))
> >
> > data2.collect
> >   res0: Array[(Int, Int)] = Array((0,0), (1,0), (2,0), (3,0), (4,1),
> (5,1),
> > (6,1), (7,1))
> >
> > new org.apache.spark.rdd.PartitionPruningRDD(data2, n => 1 == n).collect
> >   res1: Array[(Int, Int)] = Array((4,0), (5,0), (6,0), (7,0))
> >
> > So, in this case, pruning the RDD has changed the data within it.  This
> > seems to be what is causing my errors.
> >
> >
> >
> > On Sat, Nov 23, 2013 at 8:00 AM, Nathan Kronenfeld <
> > nkronenfeld@oculusinfo.com> wrote:
> >
> > > https://github.com/apache/incubator-spark/pull/18
> > >
> > >
> > > On Fri, Nov 22, 2013 at 6:35 PM, Reynold Xin <reynoldx@gmail.com>
> wrote:
> > >
> > >> Can you provide a link to your pull request?
> > >>
> > >>
> > >> On Sat, Nov 23, 2013 at 5:02 AM, Nathan Kronenfeld <
> > >> nkronenfeld@oculusinfo.com> wrote:
> > >>
> > >> > Actually, looking into recent commits, it looks like my hunch may
be
> > >> > exactly correct:
> > >> >
> > >> >
> > >>
> >
> https://github.com/apache/incubator-spark/commit/f639b65eabcc8666b74af8f13a37c5fdf7e0185f
> > >> > "PartitionPruningRDD is using index from parent"
> > >> >
> > >> > Is there anyone who can explain why this new behavior is preferable?
> > >>  And,
> > >> > if it's staying, can suggest a way to fix my tests for this case?
> > >> >
> > >> > Thanks again,
> > >> >                  Nathan
> > >> >
> > >> >
> > >> > On Fri, Nov 22, 2013 at 3:56 PM, Nathan Kronenfeld <
> > >> > nkronenfeld@oculusinfo.com> wrote:
> > >> >
> > >> > > Hi there.
> > >> > >
> > >> > > I have a problem with the unit tests on a pull request I'm trying
> to
> > >> tie
> > >> > > up.  The changes deal with partition-related functions.
> > >> > >
> > >> > > In particular, the tests I have that test an append-to-partition
> > >> function
> > >> > > work fine on my own machine, but fail on the build machine (
> > >> > >
> > >> >
> > >>
> >
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2152/console
> > >> > > ).
> > >> > >
> > >> > > The failure seems to stem from pulling a single partition out
of
> the
> > >> set.
> > >> > > In either case, when I work on the full dataset:
> > >> > >
> > >> > > UnionRDD[11] at apply at FunSuite.scala:1265 (4 partitions)
> > >> > >   UnionRDD[9] at apply at FunSuite.scala:1265 (3 partitions)
> > >> > >     ParallelCollectionRDD[8] at apply at FunSuite.scala:1265
(1
> > >> > partitions)
> > >> > >     MapPartitionsWithContextRDD[7] at apply at FunSuite.scala:1265
> > (2
> > >> > partitions)
> > >> > >       ParallelCollectionRDD[4] at apply at FunSuite.scala:1265
(2
> > >> > partitions)
> > >> > >   ParallelCollectionRDD[10] at apply at FunSuite.scala:1265 (1
> > >> > partitions)
> > >> > >
> > >> > >
> > >> > > It seems to work.  When I pull one partition out of this, by
> > wrapping
> > >> a
> > >> > PartitionPruningRDD around it (pruning out everything but partition
> > 2):
> > >> > >
> > >> > > PartitionPruningRDD[12] at apply at FunSuite.scala:1265 (1
> > partitions)
> > >> > >   UnionRDD[11] at apply at FunSuite.scala:1265 (4 partitions)
> > >> > >     UnionRDD[9] at apply at FunSuite.scala:1265 (3 partitions)
> > >> > >       ParallelCollectionRDD[8] at apply at FunSuite.scala:1265
(1
> > >> > partitions)
> > >> > >       MapPartitionsWithContextRDD[7] at apply at
> FunSuite.scala:1265
> > >> (2
> > >> > partitions)
> > >> > >         ParallelCollectionRDD[4] at apply at FunSuite.scala:1265
> (2
> > >> > partitions)
> > >> > >     ParallelCollectionRDD[10] at apply at FunSuite.scala:1265
(1
> > >> > partitions)
> > >> > >
> > >> > >
> > >> > > In this case, my local machine and the build machine seem to
act
> > >> > > differently.
> > >> > >
> > >> > > On my local machine, what is in the inner ParallelCollection
> > >> partition #2
> > >> > > shows up in the MapPartitionsWithContextRDD as partition #2 still.
> >  On
> > >> > the
> > >> > > build machine, this same partition shows up in the later RDD
as
> > >> partition
> > >> > > #0 - presumably because everything else is pruned out, but that
> > >> pruning
> > >> > > should happen at an outer level, shouldn't it?
> > >> > >
> > >> > > Does anyone know why the build machine would act different from
> > >> locally
> > >> > > here?
> > >> > >
> > >> > > Also, sadly, this worked fine two days ago.
> > >> > >
> > >> > > My only thought is that perhaps the PullRequestBuilder does a
> merge
> > >> with
> > >> > > current code, and someone broke this in the last day or two?
 Past
> > >> that,
> > >> > > I'm at a bit of a loss.
> > >> > >
> > >> > > Thanks,
> > >> > >                     -Nathan
> > >> > >
> > >> > >
> > >> > > --
> > >> > >
> > >> > > Nathan Kronenfeld
> > >> > > Senior Visualization Developer
> > >> > > Oculus Info Inc
> > >> > > 2 Berkeley Street, Suite 600,
> > >> > > Toronto, Ontario M5A 4J5
> > >> > > Phone:  +1-416-203-3003 x 238
> > >> > > Email:  nkronenfeld@oculusinfo.com
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Nathan Kronenfeld
> > >> > Senior Visualization Developer
> > >> > Oculus Info Inc
> > >> > 2 Berkeley Street, Suite 600,
> > >> > Toronto, Ontario M5A 4J5
> > >> > Phone:  +1-416-203-3003 x 238
> > >> > Email:  nkronenfeld@oculusinfo.com
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Nathan Kronenfeld
> > > Senior Visualization Developer
> > > Oculus Info Inc
> > > 2 Berkeley Street, Suite 600,
> > > Toronto, Ontario M5A 4J5
> > > Phone:  +1-416-203-3003 x 238
> > > Email:  nkronenfeld@oculusinfo.com
> > >
> >
> >
> >
> > --
> > Nathan Kronenfeld
> > Senior Visualization Developer
> > Oculus Info Inc
> > 2 Berkeley Street, Suite 600,
> > Toronto, Ontario M5A 4J5
> > Phone:  +1-416-203-3003 x 238
> > Email:  nkronenfeld@oculusinfo.com
> >
>



-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenfeld@oculusinfo.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message