spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hechem El Jed <hechem.el...@gmail.com>
Subject Re: confusing about Spark SQL json format
Date Thu, 31 Mar 2016 08:52:49 GMT
Hello,

Actually I have been through the same problem as you when I was
implementing a decision tree algorithm with Spark parsing the output to a
comprehensible json format.

So as you said; the correct json format is :
[{
    "name": "Yin",
    "address": {
        "city": "Columbus",
        "state": "Ohio"
    }
}, {
    "name": "Michael",
    "address": {
        "city": null,
        "state": "California"
    }
}]

However, I had to consider it as a list such as data[0] to get :

{
    "name": "Yin",
    "address": {
        "city": "Columbus",
        "state": "Ohio"
    }
}

and then use it for my visualizations.
Spark still a bit tricky when dealing with input/output formats, so I guess
the solution for now, is to create your own parser.


Cheers,

*Hechem El Jed*
Software Engineer & Business Analyst
MY +601131094294
TN +216 24 937 021
[image: View my profile on LinkedIn]
<https://www.linkedin.com/in/hechemeljed>

Our environment is fragile, please do not print this email unless necessary.

On Thu, Mar 31, 2016 at 4:23 PM, charles li <charles.upboy@gmail.com> wrote:

> as this post  says, that in spark, we can load a json file in this way
> bellow:
>
> *post* :
> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
>
>
>
> -----------------------------------------------------------------------------------------------
> sqlContext.jsonFile(file_path)
> or
> sqlContext.read.json(file_path)
>
> -----------------------------------------------------------------------------------------------
>
>
> and the *json file format* looks like bellow, say *people.json*
>
>
> --------------------------------------------------------------------------------------------{"name":"Yin",
> "address":{"city":"Columbus","state":"Ohio"}}
> {"name":"Michael", "address":{"city":null, "state":"California"}}
>
> -----------------------------------------------------------------------------------------------
>
>
> and here comes my *problems*:
>
> Is that the *standard json format*? according to http://www.json.org/ , I
> don't think so. it's just a *collection of records* [ a dict ], not a
> valid json format. as the json official doc, the standard json format of
> people.json should be :
>
>
> --------------------------------------------------------------------------------------------{"name":
> ["Yin", "Michael"],
> "address":[ {"city":"Columbus","state":"Ohio"},
> {"city":null, "state":"California"} ]
> }
>
> -----------------------------------------------------------------------------------------------
>
> So, why we define the json format as a collection of records in spark, I
> mean, it will lead to some unconvenient, for if we had a large standard
> json file, we need to firstly format it to make it correctly readable in
> spark, which will low-efficiency, time-consuming, un-compatible and
> space-consuming.
>
>
> great thanks,
>
>
>
>
>
>
> --
> *--------------------------------------*
> a spark lover, a quant, a developer and a good man.
>
> http://github.com/litaotao
>

Mime
View raw message