spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject bucketing in SPARK
Date Tue, 03 Apr 2018 21:32:37 GMT
Hi,

I am going through the presentation
https://databricks.com/session/hive-bucketing-in-apache-spark.

Do we need to bucket both the tables for this to work? And is it mandatory
that the number of buckets should be multiple of each other?

Also if I export a persistent table to S3 will this still work? Or is there
a way that this can work for external tables in SPARK?


*SPARK Version:* 2.3.0:

*Method to initiate SPARK Session:*
sparkSession = SparkSession.builder \
                .config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
                .config("spark.sql.sources.bucketing.enabled", "true") \
                .appName("GouravTest").getOrCreate()


Regards,
Gourav Sengupta

Mime
View raw message