spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cuevasclemente <cuevascleme...@gmail.com>
Subject Error: PartitioningCollection requires all of its partitionings have the same numPartitions.
Date Mon, 10 Oct 2016 17:24:29 GMT
Hello,

I am having some interesting issues with a consistent error in spark that
occurs when I'm working with dataframes that are the result of some amounts
of joining and other transformations. 

    PartitioningCollection requires all of its partitionings have the same
numPartitions.

It seems to happen after I join two DataFrames together which are fairly
reasonable on their own, but after joining them, the operations on the
joined dataframe can yield this error. I am really just trying to understand
why this error might be appearing or what the meaning behind it is as I
can't seem to find any documentation on it:

The following invocation results in the exception:

    val resultDataframe = dataFrame1
        .join(dataFrame2,     
            $"first_column" === $"second_column").take(2)

but I can certainly call

    dataFrame1.take(2)

and

    dataFrame2.take(2)

I also tried repartitioning the DataFrames, using
Dataset.repartition(numPartitions) or Dataset.coalesce(numParitions) on
dataFrame1 and dataFrame2 before joining, and on resultDataFrame after the
join, but nothing seemed to have affected the error. 

I cannot determine nor easily make reproducible the circumstances
surrounding the error, and this message is more asking why this error might
appear.

I posted essentially this issue on an external help site, stackoverflow,
about this issue, which I will link here as there was a small amount of
discussion I probably can't reproduce here:
http://stackoverflow.com/questions/39780784/spark-2-0-0-error-partitioningcollection-requires-all-of-its-partitionings-have/39793449
(I hope it is not frowned upon to link to an external page on help
requests), and so far the issue seems to be confirmed by at least one other
user, but I was not able to find other mentions of it in this listserv or
elsewhere through some cursory googling.


Thanks for any help



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-PartitioningCollection-requires-all-of-its-partitionings-have-the-same-numPartitions-tp27875.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message