spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: flattening a JSON data structure
Date Mon, 19 Oct 2015 20:55:10 GMT
Quickfix is probably to use Seq[Row] instead of Array (the types that are
returned are documented here:
http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types)

Really though you probably want to be using explode.  Perhaps something
like this would help?

import org.apache.spark.sql.functions._
dataFrame.select(explode($"provider.contract")).as("contract"))

On Mon, Oct 19, 2015 at 8:08 AM, nunomrc <nuno.carvalho@rightster.com>
wrote:

> Hi I am fairly new to Spark and I am trying to flatten the following
> structure:
>
>  |-- provider: struct (nullable = true)
>  |    |-- accountId: string (nullable = true)
>  |    |-- contract: array (nullable = true)
>
> And then provider is:
> root
>  |-- accountId: string (nullable = true)
>  |-- contract: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- details: struct (nullable = true)
>  |    |    |    |-- contractId: string (nullable = true)
>  |    |    |    |-- countryCode: string (nullable = true)
>  |    |    |    |-- endDate: string (nullable = true)
>  |    |    |    |-- noticePeriod: long (nullable = true)
>  |    |    |    |-- startDate: string (nullable = true)
>  |    |    |-- endDate: string (nullable = true)
>  |    |    |-- startDate: string (nullable = true)
>  |    |    |-- other: struct (nullable = true)
>  |    |    |    |-- type: string (nullable = true)
>  |    |    |    |-- values: array (nullable = true)
>  |    |    |    |    |-- element: struct (containsNull = true)
>  |    |    |    |    |    |-- key: string (nullable = true)
>  |    |    |    |    |    |-- value: long (nullable = true)
>
>
> I am trying the following:
>
> dataFrame.map { case Row(....., provider: Row, .....) =>
>    val list = provider.getAs[Array[Row]]("contract")
>
> At this point, I get the following exception:
> [info]   org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in
> stage 4.0 (TID 9, localhost): java.lang.ClassCastException:
> scala.collection.mutable.WrappedArray$ofRef cannot be cast to
> [Lorg.apache.spark.sql.Row;
> [info]  at com.mycode.Deal$$anonfun$flattenDeals$1.apply(Deal.scala:62)
>
> I tried many different variations of this and tried to get the actual data
> type of the elements of the array, without any success.
> This kind of method to flatten json data structures were working for me
> with
> previous versions of spark, but I am now trying to upgrade from 1.4.1 to
> 1.5.1 and started getting this error.
>
> What am I doing wrong?
> Any help would be appreciated.
>
> Thanks,
> Nuno
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-JSON-data-structure-tp25120.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message