spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhijeet Kumar <abhijeet.ku...@sentienz.com>
Subject Re: Hive Bucketing Support
Date Thu, 07 Jun 2018 05:51:45 GMT
I would ask my queries here <https://gitter.im/spark-scala/Lobby>.

Thanks,
Abhijeet Kumar

> On 07-Jun-2018, at 1:03 AM, Chris Martin <chris@cmartinit.co.uk> wrote:
> 
> Hi All,
> 
> 
> first off apologies if this is not the correct place to ask this!
> 
> I've been following SPARK-19256 <https://issues.apache.org/jira/browse/SPARK-19256>
(Hive Bucketing Support) with interest for some time now as we do a relatively large amount
of our data processing in Spark but use Hive for business analytics.  As a result we end up
writing a non-trivial amount of data out twice; once in parquet optimized for Spark and once
in once in orc optimized for Hive!  The hope is that SPARK-19256 will put an end to this.
> 
> I've noticed that there a PR (https://github.com/apache/spark/pull/19001 <https://github.com/apache/spark/pull/19001>)
that's been open for almost a year now, with the last comment being over a month ago.  Does
anyone know if I should remain hopeful that this support will be added in the near future
or is it one of those things that's realistically going to be some distance off.
> 
> thanks,
> 
> Chris
> 
> 
> 


Mime
View raw message