spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
Date Fri, 03 Aug 2018 03:12:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567708#comment-16567708
] 

Hyukjin Kwon commented on SPARK-24924:
--------------------------------------

Similar discussion was made in SPARK-20590 when we port CSV. in my experience, users really
don't know if {{com.databricks.spark.avro}} or {{avro}} mean external Avro jar or internal
jar (same thing happened in CSV - 
 I was active in that Spark CSV (databricks) package FWIW).

if users were using the external avro, they will likely meet the error if they directly upgrade
Spark. Otherwise, users will see the release note that Avro package is included in 2.4.0,
and they will not provide the external jar.
If users miss the release note, then they will try to explicitly provide the thirdparty jar,
which will now give the error message like:

{code}
17/05/10 09:47:44 WARN DataSource: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat,
com.databricks.spark.csv.DefaultSource15), defaulting to the internal datasource (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat).
{code}

Encouraging to use builtin's one might better be preferred since the behaviours will kept
same at its best for now.
Otherwise, If external Avro must be used, I think it can be still used if the source is specified
by fully qualified path in theory.

> Add mapping for built-in Avro data source
> -----------------------------------------
>
>                 Key: SPARK-24924
>                 URL: https://issues.apache.org/jira/browse/SPARK-24924
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map `com.databricks.spark.avro`
to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message