spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jonny.g...@gmail.com>
Subject [Spark SQL] experimental join strategy
Date Mon, 23 Oct 2017 10:21:14 GMT
Hi,

I'm trying to implement a custom join strategy using the experimental
extraStrategies extension point.  I have created a very simple
implementation (see the below gist) but currently it is failing and I don't
understand why.

https://gist.github.com/jongray/a413b5401c4edd47b7d8f559e5b2f79b

I hope the gist demonstrates what I'm trying to do but I'd like to take the
steps I've done explicitly (distinct the foreign keys on the 'left' and
then use them to filter the 'right' join table before joining it to the
left) be implemented in the strategy.

This initial implementation will takes a brute force approach to the
problem (in my particular use case I happen to want all joins following
this pattern) but I would like to extend it to take a cost based approach
(but not now).

Could someone review and point out anything that looks wrong?  The final
physical explain plans that get displayed for both the explicit and the
strategy join look nearly identical but the results from using my join
strategy are incorrect - it looks like a cartesian product of the filtered
sets.

Thanks,
Jon

Mime
View raw message