spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26801) Spark unable to read valid avro types
Date Sat, 02 Feb 2019 03:56:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758869#comment-16758869
] 

Hyukjin Kwon commented on SPARK-26801:
--------------------------------------

Thanks for reporting this. Would you be interested in narrowing down the problem?

> Spark unable to read valid avro types
> -------------------------------------
>
>                 Key: SPARK-26801
>                 URL: https://issues.apache.org/jira/browse/SPARK-26801
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dhruve Ashar
>            Priority: Major
>
> Currently the external avro package reads avro schemasĀ for type records only. This is
probably because of representation of InternalRow in spark sql. As a result, if the avro file
has anything other than a sequence of records it fails to read it.
> We faced this issue earlier while trying to read primitive types. We encountered this
again while trying to read an array of records. Below are code examples trying to read valid
avro data showing the stack traces.
> {code:java}
> spark.read.format("avro").load("avroTypes/randomInt.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> "int"
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> ======================================================================
> scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> {
> "type" : "enum",
> "name" : "Suit",
> "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ]
> }
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message