spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Eskilson <aleksander...@gmail.com>
Subject Open PRs RE: Datasets Typed by Arbitrary Avro
Date Thu, 18 Apr 2019 21:33:34 GMT
There are now a couple different pull-requests each attempting to address
the need for an enhancement providing Typed Dataset support for Avro
Objects. These PRs and their respective JIRA tickets are

   - https://github.com/apache/spark/pull/22878 :
   https://issues.apache.org/jira/browse/SPARK-25789 (originally in
   Databricks/spark-avro, https://github.com/databricks/spark-avro/pull/217
    : https://github.com/databricks/spark-avro/issues/169)
   - https://github.com/apache/spark/pull/24299 :
   https://issues.apache.org/jira/browse/SPARK-27388
   - https://github.com/apache/spark/pull/24367 :
   https://issues.apache.org/jira/browse/SPARK-27457

Approaches between these differ considerably, and respective coverages may
not be equal. Some analysis of tradeoffs and perhaps a deeper analysis of
workarounds would be necessary.

Full disclosure, I contributed significantly to Spark#22878/Spark-Avro#217,
so I don't think I'll say more about the topics in this thread, but I would
be looking to Spark committers for some more direction either here or in
the PR threads. I'd be happy to be respond to questions from the community.

The topic of and request for Typed Datasets of Avro goes back to
Spark-Avro#169 <https://github.com/databricks/spark-avro/issues/169>. I saw
relatively recently that project was folded into Spark-proper, but the need
for Statically type, Dataset support (as opposed to dynamically typed
Dataframe support) continues.

Hoping a resolution can come out of this visibility.

Aleksander Eskilson
https://github.com/bdrillard

Mime
View raw message