spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Martin <ch...@cmartinit.co.uk>
Subject Hive Bucketing Support
Date Wed, 06 Jun 2018 19:33:46 GMT
Hi All,


first off apologies if this is not the correct place to ask this!

I've been following SPARK-19256
<https://issues.apache.org/jira/browse/SPARK-19256> (Hive Bucketing
Support) with interest for some time now as we do a relatively large amount
of our data processing in Spark but use Hive for business analytics.  As a
result we end up writing a non-trivial amount of data out twice; once in
parquet optimized for Spark and once in once in orc optimized for Hive!
The hope is that SPARK-19256 will put an end to this.

I've noticed that there a PR (https://github.com/apache/spark/pull/19001)
that's been open for almost a year now, with the last comment being over a
month ago.  Does anyone know if I should remain hopeful that this support
will be added in the near future or is it one of those things that's
realistically going to be some distance off.

thanks,

Chris

Mime
View raw message