spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Sat, 07 Oct 2017 07:30:41 GMT
I have a Dataset<String> ds which consists of json rows.

*Sample Json Row (This is just an example of one row in the dataset)*

[
    {"name": "foo", "address": {"state": "CA", "country": "USA"},
"docs":[{"subject": "english", "year": 2016}]}
    {"name": "bar", "address": {"state": "OH", "country": "USA"},
"docs":[{"subject": "math", "year": 2017}]}

]

ds.printSchema()

root
 |-- value: string (nullable = true)

Now I want to convert into the following dataset using Spark 2.2.0

name  |             address               |  docs
----------------------------------------------------------------------------------
"foo" | {"state": "CA", "country": "USA"} | [{"subject": "english",
"year": 2016}]
"bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]

Preferably Java but Scala is also fine as long as there are functions
available in Java API

Mime
View raw message