spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Reading nested JSON data with Spark SQL
Date Wed, 19 Nov 2014 21:40:17 GMT
You can extract the nested fields in sql: SELECT field.nestedField ...

If you don't do that then nested fields are represented as rows within rows
and can be retrieved as follows:

t.getAs[Row](0).getInt(0)

Also, I would write t.getAs[Buffer[CharSequence]](12) as
t.getAs[Seq[String]](12) since we don't guarantee the return type will be a
buffer.


On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini <captainfranz@gmail.com>
wrote:

> I have been using Spark SQL to read in JSON data, like so:
> val myJsonFile = sqc.jsonFile(args("myLocation"))
> myJsonFile.registerTempTable("myTable")
> sqc.sql("mySQLQuery").map { row =>
> myFunction(row)
> }
>
> And then in myFunction(row) I can read the various columns with the
> Row.getX methods. However, this methods only work for basic types (string,
> int, ...).
> I was having some trouble reading columns that are arrays or maps (i.e.
> other JSON objects).
>
> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
> there is a new method getAs. I was able to use it to read for example an
> array of strings like so:
> t.getAs[Buffer[CharSequence]](12)
>
> However, if I try to read a column with a nested JSON object like this:
> t.getAs[Map[String, Any]](11)
>
> I get the following error:
> java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
> scala.collection.immutable.Map
>
> How can I read such a field? Am I just missing something small or should I
> be looking for a completely different alternative to reading JSON?
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>

Mime
View raw message