spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Announcing Delta Lake 0.2.0
Date Fri, 21 Jun 2019 10:03:21 GMT
@ayan guha <guha.ayan@gmail.com> @Gourav Sengupta
<gourav.sengupta@gmail.com>
Delta Lake is OSS currently does not support defining tables in Hive
metastore using DDL commands. We are hoping to add the necessary
compatibility fixes in Apache Spark to make Delta Lake work with tables and
DDL commands. So we will support them in a future release. In the meantime,
please read/write Delta tables using paths.

TD

On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Hi Ayan,
>
> I may be wrong about this, but I think that Delta files are in Parquet
> format. But I am sure that you have already checked this. Am I missing
> something?
>
> Regards,
> Gourav Sengupta
>
> On Fri, Jun 21, 2019 at 6:39 AM ayan guha <guha.ayan@gmail.com> wrote:
>
>> Hi
>> We used spark.sql to create a table using DELTA. We also have a hive
>> metastore attached to the spark session. Hence, a table gets created in
>> Hive metastore. We then tried to query the table from Hive. We faced
>> following issues:
>>
>>    1. SERDE is SequenceFile, should have been Parquet
>>    2. Scema fields are not passed.
>>
>> Essentially the hive DDL looks like:
>>
>> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
>> deserializer')*
>>
>> *ROW FORMAT SERDE **
>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>
>> *OUTPUTFORMAT **
>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
>> '* *WASB PATH'*
>>
>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>> 'spark.sql.sources.provider'='DELTA',**
>> 'spark.sql.sources.schema.numParts'='1',*
>> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>> 'transient_lastDdlTime'='1556544657')*
>>
>> Is this expected? And will the use case be supported in future releases?
>>
>>
>> We are now experimenting
>>
>> Best
>>
>> Ayan
>>
>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen.sun@databricks.com>
>> wrote:
>>
>>> Hi James,
>>>
>>> Right now we don't have plans for having a catalog component as part of
>>> Delta Lake, but we are looking to support Hive metastore and also DDL
>>> commands in the near future.
>>>
>>> Thanks,
>>> Liwen
>>>
>>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>>> jamescotrotsios@gmail.com> wrote:
>>>
>>>> Is there a plan to have a business catalog component for the Data Lake?
>>>> If not how would someone make a proposal to create an open source project
>>>> related to that. I would be interested in building out an open source data
>>>> catalog that would use the Hive metadata store as a baseline for technical
>>>> metadata.
>>>>
>>>>
>>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen.sun@databricks.com>
>>>> wrote:
>>>>
>>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>>
>>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>>
>>>>> To view the release notes:
>>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>>
>>>>> This release introduces two main features:
>>>>>
>>>>> *Cloud storage support*
>>>>> In addition to HDFS, you can now configure Delta Lake to read and
>>>>> write data on cloud storage services such as Amazon S3 and Azure Blob
>>>>> Storage. For configuration instructions, please see:
>>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>>
>>>>> *Improved concurrency*
>>>>> Delta Lake now allows concurrent append-only writes while still
>>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>>
>>>>> We have also greatly expanded the test coverage as part of this
>>>>> release.
>>>>>
>>>>> We would like to acknowledge all community members for contributing to
>>>>> this release.
>>>>>
>>>>> Best regards,
>>>>> Liwen Sun
>>>>>
>>>>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

Mime
View raw message