spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Cossu <elco...@gmail.com>
Subject Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Sat, 07 Oct 2017 15:28:18 GMT
Hello,
I think you should use *from_json *from spark.sql.functions
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#from_json-org.apache.spark.sql.Column-org.apache.spark.sql.types.DataType->
to parse the json string and convert it to a StructType. Afterwards, you
can create a new DataSet by selecting the columns you want.

On 7 October 2017 at 09:30, kant kodali <kanth909@gmail.com> wrote:

> I have a Dataset<String> ds which consists of json rows.
>
> *Sample Json Row (This is just an example of one row in the dataset)*
>
> [
>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject":
"english", "year": 2016}]}
>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject":
"math", "year": 2017}]}
>
> ]
>
> ds.printSchema()
>
> root
>  |-- value: string (nullable = true)
>
> Now I want to convert into the following dataset using Spark 2.2.0
>
> name  |             address               |  docs
> ----------------------------------------------------------------------------------
> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}]
> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]
>
> Preferably Java but Scala is also fine as long as there are functions
> available in Java API
>

Mime
View raw message