spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Zsolt Piros (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
Date Thu, 07 Feb 2019 21:32:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763094#comment-16763094
] 

Attila Zsolt Piros commented on SPARK-26845:
--------------------------------------------

This also works:
{code}
test("roundtrip in to_avro and from_avro - string") {
    val df = spark.createDataset(Seq("1", "2", "3")).select('value.cast("string").as("str"))

    val avroDF = df.select(to_avro('str).as("b"))
    val avroTypeStr = s"""
      |{
      |   "type": "record",
      |   "name": "topLevelRecord",
      |   "fields": [
      |     {
      |       "name": "str",
      |       "type": ["string", "null"]
      |     }
      |   ]
      |}""".stripMargin
    checkAnswer(
      avroDF.select(from_avro('b, avroTypeStr).as("rec")).select($"rec.str"),
      df)
  }
{code}
I have introduced a topLevelRecord as at the top level union types is not allowed / not working
(good question why), I mean this:
{code:javascript}
  {
    "name": "str",
    "type": ["string", "null"]
  }
{code}
Throws an exception:
{noformat}
org.apache.avro.SchemaParseException: No type: {"name":"str","type":["string","null"]} 
{noformat}

> Avro to_avro from_avro roundtrip fails if data type is string
> -------------------------------------------------------------
>
>                 Key: SPARK-26845
>                 URL: https://issues.apache.org/jira/browse/SPARK-26845
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Gabor Somogyi
>            Priority: Critical
>              Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test fails which
I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
>     val df = spark.createDataset(Seq("1", "2", "3")).select('value.cast("string").as("str"))
>     val avroDF = df.select(to_avro('str).as("b"))
>     val avroTypeStr = s"""
>       |{
>       |  "type": "string",
>       |  "name": "str"
>       |}
>     """.stripMargin
>     checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct<str:string>         struct<from_avro(b):string>
> ![1]                        []
> ![2]                        []
> ![3]                        []
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message