spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: to_avro and from_avro not working with struct type in spark 2.4
Date Wed, 27 Feb 2019 09:19:30 GMT
Hi,

I was dealing with avro stuff lately and most of the time it has something
to do with the schema.
One thing I've pinpointed quickly (where I was struggling also) is the name
field should be nullable but the result is not yet correct so further
digging needed...

scala> val expectedSchema = StructType(Seq(StructField("name",
StringType,true),StructField("age", IntegerType, false)))
expectedSchema: org.apache.spark.sql.types.StructType =
StructType(StructField(name,StringType,true),
StructField(age,IntegerType,false))

scala> val avroTypeStruct =
SchemaConverters.toAvroType(expectedSchema).toString
avroTypeStruct: String =
{"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]}

scala> dfKV.select(from_avro('value, avroTypeStruct)).show
+---------------------------------------------+
|from_avro(value, struct<name:string,age:int>)|
+---------------------------------------------+
|                              [Mary Jane, 25]|
|                              [Mary Jane, 25]|
+---------------------------------------------+

BR,
G


On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hienluu@gmail.com> wrote:

> Hi,
>
> I ran into a pretty weird issue with to_avro and from_avro where it was not
> able to parse the data in a struct correctly.  Please see the simple and
> self contained example below. I am using Spark 2.4.  I am not sure if I
> missed something.
>
> This is how I start the spark-shell on my Mac:
>
> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0
>
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.avro._
> import org.apache.spark.sql.functions._
>
>
> spark.version
>
> val df = Seq((1, "John Doe",  30), (2, "Mary Jane", 25)).toDF("id", "name",
> "age")
>
> val dfStruct = df.withColumn("value", struct("name","age"))
>
> dfStruct.show
> dfStruct.printSchema
>
> val dfKV = dfStruct.select(to_avro('id).as("key"),
> to_avro('value).as("value"))
>
> val expectedSchema = StructType(Seq(StructField("name", StringType,
> false),StructField("age", IntegerType, false)))
>
> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString
>
> val avroTypeStr = s"""
>       |{
>       |  "type": "int",
>       |  "name": "key"
>       |}
>     """.stripMargin
>
>
> dfKV.select(from_avro('key, avroTypeStr)).show
>
> // output
> +-------------------+
> |from_avro(key, int)|
> +-------------------+
> |                  1|
> |                  2|
> +-------------------+
>
> dfKV.select(from_avro('value, avroTypeStruct)).show
>
> // output
> +---------------------------------------------+
> |from_avro(value, struct<name:string,age:int>)|
> +---------------------------------------------+
> |                                        [, 9]|
> |                                        [, 9]|
> +---------------------------------------------+
>
> Please help and thanks in advance.
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message