spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Witkon <eranwit...@gmail.com>
Subject Re: How to Parse & flatten JSON object in a text file using Spark &Scala into Dataframe
Date Thu, 24 Dec 2015 08:26:14 GMT
I don't have the exact answer for you but I would look for something using
explode method on DataFrame

On Thu, Dec 24, 2015 at 7:34 AM Bharathi Raja <rajakbv@yahoo.com> wrote:

> Thanks Gokul, but the file I have had the same format as I have mentioned.
> First two columns are not in Json format.
>
> Thanks,
> Raja
> ------------------------------
> From: Gokula Krishnan D <email2dgk@gmail.com>
> Sent: ‎12/‎24/‎2015 2:44 AM
> To: Eran Witkon <eranwitkon@gmail.com>
> Cc: raja kbv <rajakbv@yahoo.com>; user@spark.apache.org
>
> Subject: Re: How to Parse & flatten JSON object in a text file using
> Spark &Scala into Dataframe
>
> You can try this .. But slightly modified the  input structure since first
> two columns were not in Json format.
>
> [image: Inline image 1]
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
> On Wed, Dec 23, 2015 at 9:46 AM, Eran Witkon <eranwitkon@gmail.com> wrote:
>
>> Did you get a solution for this?
>>
>> On Tue, 22 Dec 2015 at 20:24 raja kbv <rajakbv@yahoo.com.invalid> wrote:
>>
>>> Hi,
>>>
>>> I am new to spark.
>>>
>>> I have a text file with below structure.
>>>
>>>
>>> (employeeID: Int, Name: String, ProjectDetails:
>>> JsonObject{[{ProjectName, Description, Duriation, Role}]})
>>> Eg:
>>> (123456, Employee1, {“ProjectDetails”:[
>>>                                                          {
>>> “ProjectName”: “Web Develoement”, “Description” : “Online Sales
website”,
>>> “Duration” : “6 Months” , “Role” : “Developer”}
>>>                                                          {
>>> “ProjectName”: “Spark Develoement”, “Description” : “Online Sales
>>> Analysis”, “Duration” : “6 Months” , “Role” : “Data Engineer”}
>>>                                                          {
>>> “ProjectName”: “Scala Training”, “Description” : “Training”,
“Duration” :
>>> “1 Month” }
>>>                                                           ]
>>>                                                 }
>>>
>>>
>>> Could someone help me to parse & flatten the record as below dataframe
>>> using scala?
>>>
>>> employeeID,Name, ProjectName, Description, Duration, Role
>>> 123456, Employee1, Web Develoement, Online Sales website, 6 Months ,
>>> Developer
>>> 123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months,
>>> Data Engineer
>>> 123456, Employee1, Scala Training, Training, 1 Month, null
>>>
>>>
>>> Thank you in advance.
>>>
>>> Regards,
>>> Raja
>>>
>>
>

Mime
View raw message