spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
Date Fri, 03 Aug 2018 15:28:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568348#comment-16568348
] 

Hyukjin Kwon commented on SPARK-24924:
--------------------------------------

{quote}
but at the same time we aren't adding the spark.read.avro syntax so it break in that case
or they get a different implementation by default?   
{quote}

If users call this, that's still going to use the builtin implemtnation (https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/package.scala#L26)
as it's a short name for {{format("com.databricks.spark.avro")}}.

{quote}
our internal implementation which could very well be different.
{quote}

It wouldn't be very different for 2.4.0. It could be different but I guess it should be incremental
improvement without behaviour changes.

{quote}
 I would rather just plain error out saying these conflict, either update or change your external
package to use a different name. 
{quote}

IIRC, in the past, we did for CSV datasource and many users complained about this.

{code}
java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat,
com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name.
{code}

In practice, I am actually a bit more sure on the current approach since users actually complained
about his a lot and now I am not seeing (so far) the complains about the current approach.

{code}
There is also the case one might be able to argue its breaking api compatilibity since .avro
option went away, buts it a third party library so you can probably get away with that. 
{code}

It's went away so I guess if the jar is provided with implicit import to support this, this
should work as usual and use the internal implementation in theory. If the jar is not given,
.avro API is not supported and the internal implmentation will be used. 


> Add mapping for built-in Avro data source
> -----------------------------------------
>
>                 Key: SPARK-24924
>                 URL: https://issues.apache.org/jira/browse/SPARK-24924
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map `com.databricks.spark.avro`
to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message