spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruve Ashar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26801) Spark unable to read valid avro types
Date Mon, 04 Feb 2019 20:15:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760178#comment-16760178
] 

Dhruve Ashar commented on SPARK-26801:
--------------------------------------

I have given a short summary of the issue in the PR description. It would be great if you
could review it [~hyukjin.kwon]

Thanks.

> Spark unable to read valid avro types
> -------------------------------------
>
>                 Key: SPARK-26801
>                 URL: https://issues.apache.org/jira/browse/SPARK-26801
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dhruve Ashar
>            Priority: Major
>
> Currently the external avro package reads avro schemasĀ for type records only. This is
probably because of representation of InternalRow in spark sql. As a result, if the avro file
has anything other than a sequence of records it fails to read it.
> We faced this issue earlier while trying to read primitive types. We encountered this
again while trying to read an array of records. Below are code examples trying to read valid
avro data showing the stack traces.
> {code:java}
> spark.read.format("avro").load("avroTypes/randomInt.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> "int"
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> ======================================================================
> scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> {
> "type" : "enum",
> "name" : "Suit",
> "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ]
> }
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message