spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hien Luu <>
Subject RE: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Sat, 06 Jan 2018 19:42:57 GMT
Hi Kant,

I am not sure whether you had come up with a solution yet, but the following
works for me (in Scala)

val emp_info = """
    {"name": "foo", "address": {"state": "CA", "country": "USA"},
"docs":[{"subject": "english", "year": 2016}]},
    {"name": "bar", "address": {"state": "OH", "country": "USA"},
"docs":[{"subject": "math", "year": 2017}]} 

import org.apache.spark.sql.types._

val addressSchema = new StructType().add("state", StringType).add("country",
val docsSchema = ArrayType(new StructType().add("subject",
StringType).add("year", IntegerType))
val employeeSchema = new StructType().add("name", StringType).add("address",
addressSchema).add("docs", docsSchema)

val empInfoSchema = ArrayType(employeeSchema)


val empInfoStrDF = Seq((emp_info)).toDF("emp_info_str")

val empInfoDF ='emp_info_str,
empInfoDF.printSchema"*")).show(false)"", "emp_info.address",

Sent from:

To unsubscribe e-mail:

View raw message