spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Witkon <eranwit...@gmail.com>
Subject Re: Extract compressed JSON withing JSON
Date Thu, 24 Dec 2015 10:54:18 GMT
Answered using StackOverflow. if you are looking for the solution:
This is the trick:


val jsonNested = sqlContext.read.json(jsonUnGzip.map{case
Row(cty:String, json:String,nm:String,yrs:String) => s"""{"cty":
\"$cty\", "extractedJson": $json , "nm": \"$nm\" , "yrs":
\"$yrs\"}"""})

See this link for source
http://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes

Eran


On Thu, Dec 24, 2015 at 11:42 AM Eran Witkon <eranwitkon@gmail.com> wrote:

> Hi,
>
> I have a JSON file with the following row format:
> {"cty":"United
> Kingdom","gzip":"H4sIAAAAAAAAAKtWystVslJQcs4rLVHSUUouqQTxQvMyS1JTFLwz89JT8nOB4hnFqSBxj/zS4lSF/DQFl9S83MSibKBMZVExSMbQwNBM19DA2FSpFgDvJUGVUwAAAA==","nm":"Edmund
> lronside","yrs":"1016"}
>
> The gzip field is a compressed JSON by itself
>
> I want to read the file and build the full nested JSON as a row:
>
> {"cty":"United Kingdom","hse":{"nm": "Cnut","cty": "United Kingdom","hse": "House of
Denmark","yrs": "1016-1035"},"nm":"Edmund lronside","yrs":"1016"}
>
> I already have the function which extract the compressed field to a string.
>
> Questions:
>
> *if I use the following code the build the RDD :*
>
> val jsonData = sqlContext.read.json(sourceFilesPath)
> //
> //loop through the DataFrame and manipulate the gzip Filed
>
> val jsonUnGzip = jsonData.map(r => Row(r.getString(0), GZipHelper.unCompress(r.getString(1)).get,
r.getString(2), r.getString(3)))
>
> *I get a row with 4 columns (String,String,String,String)*
>
>  org.apache.spark.sql.Row = [United Kingdom,{"nm": "Cnut","cty": "United Kingdom","hse":
"House of Denmark","yrs": "1016-1035"},Edmund lronside,1016]
>
> *Now, I can't tell Spark to "re-parse" Col(1) as JSON, right?*
>
> I seen some post about using case classes or explode but I don't understand how this
can help here?
>
> Eran
>
>

Mime
View raw message