spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Shtelma <mshte...@gmail.com>
Subject Inner join with the table itself
Date Mon, 15 Jan 2018 09:23:52 GMT
Hi all,

If I try joining the table with itself using join columns, I am
getting the following error:
"Join condition is missing or trivial. Use the CROSS JOIN syntax to
allow cartesian products between these relations.;"

This is not true, and my join is not trivial and is not a real cross
join. I am providing join condition and expect to get maybe a couple
of joined rows for each row in the original table.

There is a workaround for this, which implies renaming all the columns
in source data frame and only afterwards proceed with the join. This
allows us to fool spark.

Now I am wondering if there is a way to get rid of this problem in a
better way? I do not like the idea of renaming the columns because
this makes it really difficult to keep track of the names in the
columns in result data frames.
Is it possible to deactivate this check?

Thanks,
Michael

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message