spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject bucketing in SPARK
Date Tue, 03 Apr 2018 21:32:37 GMT

I am going through the presentation

Do we need to bucket both the tables for this to work? And is it mandatory
that the number of buckets should be multiple of each other?

Also if I export a persistent table to S3 will this still work? Or is there
a way that this can work for external tables in SPARK?

*SPARK Version:* 2.3.0:

*Method to initiate SPARK Session:*
sparkSession = SparkSession.builder \
"org.apache.spark.serializer.KryoSerializer") \
                .config("spark.sql.sources.bucketing.enabled", "true") \

Gourav Sengupta

View raw message