spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Damji <dmat...@comcast.net>
Subject Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Sat, 07 Oct 2017 17:00:40 GMT
You might find these blogs helpful to parse & extract data from complex structures:

https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

https://databricks.com/blog/2017/06/13/five-spark-sql-utility-functions-extract-explore-complex-data-types.html

Cheers 
Jules


Sent from my iPhone
Pardon the dumb thumb typos :)

> On Oct 7, 2017, at 12:30 AM, kant kodali <kanth909@gmail.com> wrote:
> 
> I have a Dataset<String> ds which consists of json rows.
> 
> Sample Json Row (This is just an example of one row in the dataset)
> 
> [ 
>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject":
"english", "year": 2016}]}
>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject":
"math", "year": 2017}]}
> 
> ]
> ds.printSchema()
> 
> root
>  |-- value: string (nullable = true)
> Now I want to convert into the following dataset using Spark 2.2.0
> 
> name  |             address               |  docs 
> ----------------------------------------------------------------------------------
> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}]
> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]
> Preferably Java but Scala is also fine as long as there are functions available in Java
API

Mime
View raw message