spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <sel...@gmail.com>
Subject Re: how to read object field within json file
Date Sat, 25 Mar 2017 22:29:57 GMT
Thank you Armbust.

On Fri, Mar 24, 2017 at 7:02 PM, Michael Armbrust <michael@databricks.com>
wrote:

> I'm not sure you can parse this as an Array, but you can hint to the
> parser that you would like to treat source as a map instead of as a
> struct.  This is a good strategy when you have dynamic columns in your data.
>
> Here is an example of the schema you can use to parse this JSON and also
> how to use explode to turn it into separate rows
> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/679071429109042/2840265927289860/latest.html>.
> This blog post has more on working with semi-structured data in Spark
> <https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>
> .
>
> On Thu, Mar 23, 2017 at 2:49 PM, Yong Zhang <java8964@hotmail.com> wrote:
>
>> That's why your "source" should be defined as an Array[Struct] type
>> (which makes sense in this case, it has an undetermined length  , so you
>> can explode it and get the description easily.
>>
>> Now you need write your own UDF, maybe can do what you want.
>>
>> Yong
>>
>> ------------------------------
>> *From:* Selvam Raman <selmna@gmail.com>
>> *Sent:* Thursday, March 23, 2017 5:03 PM
>> *To:* user
>> *Subject:* how to read object field within json file
>>
>> Hi,
>>
>> {
>> "id": "test1",
>> "source": {
>>     "F1": {
>>       "id": "4970",
>>       "eId": "F1",
>>       "description": "test1",
>>     },
>>     "F2": {
>>       "id": "5070",
>>       "eId": "F2",
>>       "description": "test2",
>>     },
>>     "F3": {
>>       "id": "5170",
>>       "eId": "F3",
>>       "description": "test3",
>>     },
>>     "F4":{}
>>       etc..
>>       "F999":{}
>> }
>>
>> I am having bzip json files like above format.
>> some json row contains two objects within source(like F1 and F2),
>> sometime five(F1,F2,F3,F4,F5),etc. So the final schema will contains
>> combination of all objects for the source field.
>>
>> Now, every row will contain n number of objects but only some contains
>> valid records.
>> how can i retreive the value of "description" in "source" field.
>>
>> source.F1.description - returns the result but how can i get all
>> description result for every row..(something like this
>> "source.*.description").
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Mime
View raw message