spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?
Date Mon, 09 Oct 2017 19:14:21 GMT
https://issues.apache.org/jira/browse/SPARK-22228

On Sun, Oct 8, 2017 at 11:58 AM, kant kodali <kanth909@gmail.com> wrote:

> I have the following so far
>
> private StructType getSchema() {
>     return new StructType()
>             .add("name", StringType)
>             .add("address", StringType)
>             .add("docs", StringType);
> }
>
>
> ds.select(explode_outer(from_json(ds.col("value"), ArrayType.apply(getSchema()))).as("result")).selectExpr("result.*");
>
> This didn't quite work for me so just to clarify I have Json array of
> documents as my input string
> and I am trying to keep the values of my name, address, docs columns as a
> string as well except
> my input array string is flattened out by explode function.
> Any suggestions will be great
> Thanks!
>
>
> On Sat, Oct 7, 2017 at 10:00 AM, Jules Damji <dmatrix@comcast.net> wrote:
>
>> You might find these blogs helpful to parse & extract data from complex
>> structures:
>>
>> https://databricks.com/blog/2017/06/27/4-sql-high-order-lamb
>> da-functions-examine-complex-structured-data-databricks.html
>>
>> https://databricks.com/blog/2017/06/13/five-spark-sql-utilit
>> y-functions-extract-explore-complex-data-types.html
>>
>> Cheers
>> Jules
>>
>>
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>>
>> On Oct 7, 2017, at 12:30 AM, kant kodali <kanth909@gmail.com> wrote:
>>
>> I have a Dataset<String> ds which consists of json rows.
>>
>> *Sample Json Row (This is just an example of one row in the dataset)*
>>
>> [
>>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject":
"english", "year": 2016}]}
>>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject":
"math", "year": 2017}]}
>>
>> ]
>>
>> ds.printSchema()
>>
>> root
>>  |-- value: string (nullable = true)
>>
>> Now I want to convert into the following dataset using Spark 2.2.0
>>
>> name  |             address               |  docs
>> ----------------------------------------------------------------------------------
>> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}]
>> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]
>>
>> Preferably Java but Scala is also fine as long as there are functions
>> available in Java API
>>
>>
>

Mime
View raw message