spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Pihony <justin.pih...@gmail.com>
Subject SparkSQL JSON array support
Date Fri, 06 Mar 2015 02:11:10 GMT
Is there any plans of supporting JSON arrays more fully? Take for example:

    val myJson =
sqlContext.jsonRDD(List("""{"foo":[{"bar":1},{"baz":2}]}"""))
    myJson.registerTempTable("JsonTest")

I would like a way to pull out parts of the array data based on a key

    sql("""SELECT foo["bar"] FROM JsonTest""") //projects only the object
with bar, the rest would be null
 
I could even work around this if there was some way to access the key name
from the SchemaRDD:

    myJson.filter(x=>x(0).asInstanceOf[Seq[Row]].exists(y=>y.key == "bar"))
        .map(x=>x(0).asInstanceOf[Seq[Row]].filter(y=>y.key == "bar")) 
    //This does the same as above, except also filtering out those without a
bar key

This is the closest suggestion I could find thus far,
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView> 
which still does not solve the problem of pulling out the keys.

I tried with a UDF also, but could not currently make that work either.

If there isn't anything in the works, then would it be appropriate to create
a ticket for this?

Thanks,
Justin



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-JSON-array-support-tp21939.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message