spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Announcing Delta Lake 0.2.0
Date Fri, 21 Jun 2019 05:38:56 GMT
Hi
We used spark.sql to create a table using DELTA. We also have a hive
metastore attached to the spark session. Hence, a table gets created in
Hive metastore. We then tried to query the table from Hive. We faced
following issues:

   1. SERDE is SequenceFile, should have been Parquet
   2. Scema fields are not passed.

Essentially the hive DDL looks like:

*CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
deserializer')*

*ROW FORMAT SERDE **
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*

*OUTPUTFORMAT **
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
'* *WASB PATH'*

*TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
'spark.sql.sources.provider'='DELTA',**
'spark.sql.sources.schema.numParts'='1',*
*  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
'transient_lastDdlTime'='1556544657')*

Is this expected? And will the use case be supported in future releases?


We are now experimenting

Best

Ayan

On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen.sun@databricks.com> wrote:

> Hi James,
>
> Right now we don't have plans for having a catalog component as part of
> Delta Lake, but we are looking to support Hive metastore and also DDL
> commands in the near future.
>
> Thanks,
> Liwen
>
> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
> jamescotrotsios@gmail.com> wrote:
>
>> Is there a plan to have a business catalog component for the Data Lake?
>> If not how would someone make a proposal to create an open source project
>> related to that. I would be interested in building out an open source data
>> catalog that would use the Hive metadata store as a baseline for technical
>> metadata.
>>
>>
>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen.sun@databricks.com>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>>

-- 
Best Regards,
Ayan Guha

Mime
View raw message