spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Announcing Delta Lake 0.2.0
Date Fri, 21 Jun 2019 07:48:18 GMT
Hi Ayan,

I may be wrong about this, but I think that Delta files are in Parquet
format. But I am sure that you have already checked this. Am I missing
something?

Regards,
Gourav Sengupta

On Fri, Jun 21, 2019 at 6:39 AM ayan guha <guha.ayan@gmail.com> wrote:

> Hi
> We used spark.sql to create a table using DELTA. We also have a hive
> metastore attached to the spark session. Hence, a table gets created in
> Hive metastore. We then tried to query the table from Hive. We faced
> following issues:
>
>    1. SERDE is SequenceFile, should have been Parquet
>    2. Scema fields are not passed.
>
> Essentially the hive DDL looks like:
>
> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
> deserializer')*
>
> *ROW FORMAT SERDE **
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>
> *OUTPUTFORMAT **
> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
> '* *WASB PATH'*
>
> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
> 'spark.sql.sources.provider'='DELTA',**
> 'spark.sql.sources.schema.numParts'='1',*
> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
> 'transient_lastDdlTime'='1556544657')*
>
> Is this expected? And will the use case be supported in future releases?
>
>
> We are now experimenting
>
> Best
>
> Ayan
>
> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen.sun@databricks.com>
> wrote:
>
>> Hi James,
>>
>> Right now we don't have plans for having a catalog component as part of
>> Delta Lake, but we are looking to support Hive metastore and also DDL
>> commands in the near future.
>>
>> Thanks,
>> Liwen
>>
>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>> jamescotrotsios@gmail.com> wrote:
>>
>>> Is there a plan to have a business catalog component for the Data Lake?
>>> If not how would someone make a proposal to create an open source project
>>> related to that. I would be interested in building out an open source data
>>> catalog that would use the Hive metadata store as a baseline for technical
>>> metadata.
>>>
>>>
>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen.sun@databricks.com>
>>> wrote:
>>>
>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>
>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>
>>>> To view the release notes:
>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>
>>>> This release introduces two main features:
>>>>
>>>> *Cloud storage support*
>>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>>> For configuration instructions, please see:
>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>
>>>> *Improved concurrency*
>>>> Delta Lake now allows concurrent append-only writes while still
>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>
>>>> We have also greatly expanded the test coverage as part of this release.
>>>>
>>>> We would like to acknowledge all community members for contributing to
>>>> this release.
>>>>
>>>> Best regards,
>>>> Liwen Sun
>>>>
>>>>
>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message