spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Franzini <captainfr...@gmail.com>
Subject Re: Reading nested JSON data with Spark SQL
Date Wed, 19 Nov 2014 22:36:56 GMT
This works great, thank you!

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Wed, Nov 19, 2014 at 3:40 PM, Michael Armbrust <michael@databricks.com>
wrote:

> You can extract the nested fields in sql: SELECT field.nestedField ...
>
> If you don't do that then nested fields are represented as rows within
> rows and can be retrieved as follows:
>
> t.getAs[Row](0).getInt(0)
>
> Also, I would write t.getAs[Buffer[CharSequence]](12) as
> t.getAs[Seq[String]](12) since we don't guarantee the return type will be
> a buffer.
>
>
> On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini <captainfranz@gmail.com>
> wrote:
>
>> I have been using Spark SQL to read in JSON data, like so:
>> val myJsonFile = sqc.jsonFile(args("myLocation"))
>> myJsonFile.registerTempTable("myTable")
>> sqc.sql("mySQLQuery").map { row =>
>> myFunction(row)
>> }
>>
>> And then in myFunction(row) I can read the various columns with the
>> Row.getX methods. However, this methods only work for basic types (string,
>> int, ...).
>> I was having some trouble reading columns that are arrays or maps (i.e.
>> other JSON objects).
>>
>> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
>> there is a new method getAs. I was able to use it to read for example an
>> array of strings like so:
>> t.getAs[Buffer[CharSequence]](12)
>>
>> However, if I try to read a column with a nested JSON object like this:
>> t.getAs[Map[String, Any]](11)
>>
>> I get the following error:
>> java.lang.ClassCastException:
>> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
>> scala.collection.immutable.Map
>>
>> How can I read such a field? Am I just missing something small or should
>> I be looking for a completely different alternative to reading JSON?
>>
>> Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>
>

Mime
View raw message