spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JG Perrin <jper...@lumeris.com>
Subject RE: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Tue, 10 Oct 2017 18:24:22 GMT
Something along the line of:

Dataset<Row> df = spark.read().json(jsonDf); ?


From: kant kodali [mailto:kanth909@gmail.com]
Sent: Saturday, October 07, 2017 2:31 AM
To: user @spark <user@spark.apache.org>
Subject: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?


I have a Dataset<String> ds which consists of json rows.

Sample Json Row (This is just an example of one row in the dataset)

[

    {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english",
"year": 2016}]}

    {"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math",
"year": 2017}]}



]

ds.printSchema()

root

 |-- value: string (nullable = true)

Now I want to convert into the following dataset using Spark 2.2.0

name  |             address               |  docs

----------------------------------------------------------------------------------

"foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}]

"bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]

Preferably Java but Scala is also fine as long as there are functions available in Java API
Mime
View raw message