spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Eskilson <aleksander...@gmail.com>
Subject Requesting a Plan for Avro-typed Datasets
Date Thu, 16 May 2019 19:32:58 GMT
Hi all,

There's been longstanding demand for statically typed Datasets of Avro.
Functionality from the now-deprecated Databricks Spark-Avro project was
folded into Spark, but can still only provide DataFrames over Avro data. As
is discussed in the PR below, there are still drawbacks from not having
fully, statically typed Datasets of Avro.

There's an open PR adding a first-class Encoder for statically typed
Datasets of Avro:

https://github.com/apache/spark/pull/22878 :
https://issues.apache.org/jira/browse/SPARK-25789 (originally in
Databricks/spark-avro, https://github.com/databricks/spark-avro/pull/217 :
https://github.com/databricks/spark-avro/issues/169)

We've tested the content of this PR widely over complex, deeply nested,
Avro structures. It seems ready for a last review and nearly ready for
merger.

Alek Eskilson
github : bdrillard

Mime
View raw message