spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From naresh Goud <>
Subject Re: New Spark Datasource for Hive ACID tables
Date Fri, 26 Jul 2019 13:38:31 GMT
Thanks Abhishek.

Will it work on hive acid table which is not compacted ? i.e table having
base and delta files?

Let’s say hive acid table customer

Create table customer(customer_id int, customer_name string, customer_email
string) cluster by customer_id buckets 10 location ‘/test/customer’

And table hdfs path having below directories


That means table having updates and major compaction not run.

Will it spark reader works ?

Thank you,

On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <>

> Hi All,
> We at Qubole <> have open sourced a datasource
> that will enable users to work on their Hive ACID Transactional Tables
> <>
> using Spark.
> Github:
> Hive ACID tables allow users to work on their data transactionally, and
> also gives them the ability to Delete, Update and Merge data efficiently
> without having to rewrite all of their data in a table, partition or file.
> We believe that being able to work on these tables from Spark is a much
> desired value add, as is also apparent in
> and
> with multiple people
> looking for it. Currently the datasource supports reading from these ACID
> tables only, and we are working on adding the ability to write into these
> tables via Spark as well.
> The datasource is also available as a spark package, and instructions on
> how to use it are available on the Github page
> <>.
> We welcome your feedback and suggestions.
> Thanks,
> Abhishek Somani

View raw message