spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Somani <abhisheksoman...@gmail.com>
Subject Re: New Spark Datasource for Hive ACID tables
Date Fri, 26 Jul 2019 14:47:55 GMT
Hey Naresh,

Thanks for your question. Yes it will work!

Thanks,
Abhishek Somani

On Fri, Jul 26, 2019 at 7:08 PM naresh Goud <nareshgoud.dulam@gmail.com>
wrote:

> Thanks Abhishek.
>
> Will it work on hive acid table which is not compacted ? i.e table having
> base and delta files?
>
> Let’s say hive acid table customer
>
> Create table customer(customer_id int, customer_name string,
> customer_email string) cluster by customer_id buckets 10 location
> ‘/test/customer’ tableproperties(transactional=true)
>
>
> And table hdfs path having below directories
>
> /test/customer/base_15234/
> /test/customer/delta_1234_456
>
>
> That means table having updates and major compaction not run.
>
> Will it spark reader works ?
>
>
> Thank you,
> Naresh
>
>
>
>
>
>
>
> On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <
> abhisheksomani88@gmail.com> wrote:
>
>> Hi All,
>>
>> We at Qubole <https://www.qubole.com/> have open sourced a datasource
>> that will enable users to work on their Hive ACID Transactional Tables
>> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
>> using Spark.
>>
>> Github: https://github.com/qubole/spark-acid
>>
>> Hive ACID tables allow users to work on their data transactionally, and
>> also gives them the ability to Delete, Update and Merge data efficiently
>> without having to rewrite all of their data in a table, partition or file.
>> We believe that being able to work on these tables from Spark is a much
>> desired value add, as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking for it. Currently the datasource supports reading from these ACID
>> tables only, and we are working on adding the ability to write into these
>> tables via Spark as well.
>>
>> The datasource is also available as a spark package, and instructions on
>> how to use it are available on the Github page
>> <https://github.com/qubole/spark-acid>.
>>
>> We welcome your feedback and suggestions.
>>
>> Thanks,
>> Abhishek Somani
>>
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>

Mime
View raw message