spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Szymaniuk <marcin.szyman...@gmail.com>
Subject Spark+hive bucketing
Date Tue, 16 Jun 2015 07:56:19 GMT
Spark SQL document states:
Tables with buckets: bucket is the hash partitioning within a Hive table
partition. Spark SQL doesn’t support buckets yet

What exactly does that mean?:

   - that writing to bucketed table wont respect this feature and data will
   be written in not bucketed manner?
   - that reading from bucketed table won't use this feature to improve
   performance?
   - both?

Also, event if bucketing is not supported for reading - do we benefit from
having bucketed table just because of the way data is stored in hdfs? If we
read bucketed table in spark is it more likely that data from the same
bucket will be processed by the same task/executor?

Mime
View raw message