spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: Reading nested JSON data with Spark SQL
Date Wed, 19 Nov 2014 21:40:17 GMT
You can extract the nested fields in sql: SELECT field.nestedField ...

If you don't do that then nested fields are represented as rows within rows
and can be retrieved as follows:


Also, I would write t.getAs[Buffer[CharSequence]](12) as
t.getAs[Seq[String]](12) since we don't guarantee the return type will be a

On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini <>

> I have been using Spark SQL to read in JSON data, like so:
> val myJsonFile = sqc.jsonFile(args("myLocation"))
> myJsonFile.registerTempTable("myTable")
> sqc.sql("mySQLQuery").map { row =>
> myFunction(row)
> }
> And then in myFunction(row) I can read the various columns with the
> Row.getX methods. However, this methods only work for basic types (string,
> int, ...).
> I was having some trouble reading columns that are arrays or maps (i.e.
> other JSON objects).
> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
> there is a new method getAs. I was able to use it to read for example an
> array of strings like so:
> t.getAs[Buffer[CharSequence]](12)
> However, if I try to read a column with a nested JSON object like this:
> t.getAs[Map[String, Any]](11)
> I get the following error:
> java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
> scala.collection.immutable.Map
> How can I read such a field? Am I just missing something small or should I
> be looking for a completely different alternative to reading JSON?
> Simone Franzini, PhD

View raw message