spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: to_avro and from_avro not working with struct type in spark 2.4
Date Thu, 28 Feb 2019 12:01:55 GMT
No, just take a look at the schema of dfStruct since you've converted its
value column with to_avro:

scala> dfStruct.printSchema
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- value: struct (nullable = false)
 |    |-- name: string (nullable = true)
 |    |-- age: integer (nullable = false)


On Wed, Feb 27, 2019 at 6:51 PM Hien Luu <hienluu@gmail.com> wrote:

> Thanks for looking into this.  Does this mean string fields should alway
> be nullable?
>
> You are right that the result is not yet correct and further digging is
> needed :(
>
> On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi <gabor.g.somogyi@gmail.com>
> wrote:
>
>> Hi,
>>
>> I was dealing with avro stuff lately and most of the time it has
>> something to do with the schema.
>> One thing I've pinpointed quickly (where I was struggling also) is the
>> name field should be nullable but the result is not yet correct so further
>> digging needed...
>>
>> scala> val expectedSchema = StructType(Seq(StructField("name",
>> StringType,true),StructField("age", IntegerType, false)))
>> expectedSchema: org.apache.spark.sql.types.StructType =
>> StructType(StructField(name,StringType,true),
>> StructField(age,IntegerType,false))
>>
>> scala> val avroTypeStruct =
>> SchemaConverters.toAvroType(expectedSchema).toString
>> avroTypeStruct: String =
>> {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]}
>>
>> scala> dfKV.select(from_avro('value, avroTypeStruct)).show
>> +---------------------------------------------+
>> |from_avro(value, struct<name:string,age:int>)|
>> +---------------------------------------------+
>> |                              [Mary Jane, 25]|
>> |                              [Mary Jane, 25]|
>> +---------------------------------------------+
>>
>> BR,
>> G
>>
>>
>> On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hienluu@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I ran into a pretty weird issue with to_avro and from_avro where it was
>>> not
>>> able to parse the data in a struct correctly.  Please see the simple and
>>> self contained example below. I am using Spark 2.4.  I am not sure if I
>>> missed something.
>>>
>>> This is how I start the spark-shell on my Mac:
>>>
>>> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0
>>>
>>> import org.apache.spark.sql.types._
>>> import org.apache.spark.sql.avro._
>>> import org.apache.spark.sql.functions._
>>>
>>>
>>> spark.version
>>>
>>> val df = Seq((1, "John Doe",  30), (2, "Mary Jane", 25)).toDF("id",
>>> "name",
>>> "age")
>>>
>>> val dfStruct = df.withColumn("value", struct("name","age"))
>>>
>>> dfStruct.show
>>> dfStruct.printSchema
>>>
>>> val dfKV = dfStruct.select(to_avro('id).as("key"),
>>> to_avro('value).as("value"))
>>>
>>> val expectedSchema = StructType(Seq(StructField("name", StringType,
>>> false),StructField("age", IntegerType, false)))
>>>
>>> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString
>>>
>>> val avroTypeStr = s"""
>>>       |{
>>>       |  "type": "int",
>>>       |  "name": "key"
>>>       |}
>>>     """.stripMargin
>>>
>>>
>>> dfKV.select(from_avro('key, avroTypeStr)).show
>>>
>>> // output
>>> +-------------------+
>>> |from_avro(key, int)|
>>> +-------------------+
>>> |                  1|
>>> |                  2|
>>> +-------------------+
>>>
>>> dfKV.select(from_avro('value, avroTypeStruct)).show
>>>
>>> // output
>>> +---------------------------------------------+
>>> |from_avro(value, struct<name:string,age:int>)|
>>> +---------------------------------------------+
>>> |                                        [, 9]|
>>> |                                        [, 9]|
>>> +---------------------------------------------+
>>>
>>> Please help and thanks in advance.
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> Regards,
>

Mime
View raw message