Hi,

I am going through the presentation https://databricks.com/session/hive-bucketing-in-apache-spark

Do we need to bucket both the tables for this to work? And is it mandatory that the number of buckets should be multiple of each other?

Also if I export a persistent table to S3 will this still work? Or is there a way that this can work for external tables in SPARK?


SPARK Version: 2.3.0:

Method to initiate SPARK Session:
sparkSession = SparkSession.builder \
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
                .config("spark.sql.sources.bucketing.enabled", "true") \
                .appName("GouravTest").getOrCreate()


Regards,
Gourav Sengupta