spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Lord <jl...@vectra.ai.INVALID>
Subject UnspecifiedDistribution Error using AQE
Date Tue, 03 Aug 2021 18:36:53 GMT
Hello spark users,

I have an error that I would like to report as a spark 3.1.1 bug but I do not know how to
create a reproducible example. I can provide a full stack trace if desired but the most useful
information seems to be

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o3301.toJavaRDD.
E                   : java.lang.IllegalStateException: UnspecifiedDistribution does not have
default partitioning.
E                       at org.apache.spark.sql.catalyst.plans.physical.UnspecifiedDistribution$.createPartitioning(partitioning.scala:52)
E                       at org.apache.spark.sql.execution.exchange.EnsureRequirements$.$anonfun$ensureDistributionAndOrdering$1(EnsureRequirements.scala:54)

This error happens when I have spark.sql.adaptive.enabled=true but does not happen when I
change to false. It happens for both one of my unit tests (~30 rows) and with production data.
Another work-around is to cache the dataframe before calling the collect/toJSON statement.

I was not able to find any information about this kind of error on the jira or from stackexchange.
I was wondering if anyone has seen this error before related to AQE and has any suggestions
for trying to report it.

Thanks,
Jesse


Mime
View raw message