spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Witkon <eranwit...@gmail.com>
Subject Re: How to Parse & flatten JSON object in a text file using Spark &Scala into Dataframe
Date Thu, 24 Dec 2015 10:36:49 GMT
raja! I found the answer to your question!
Look at
http://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes
this is what you (and I) was looking for.
general idea - you read the list as text where project Details is just a
string field and then you build the JSON string representation of the whole
line and you have a nested JSON schema which SparkSQL can read.

Eran

On Thu, Dec 24, 2015 at 10:26 AM Eran Witkon <eranwitkon@gmail.com> wrote:

> I don't have the exact answer for you but I would look for something using
> explode method on DataFrame
>
> On Thu, Dec 24, 2015 at 7:34 AM Bharathi Raja <rajakbv@yahoo.com> wrote:
>
>> Thanks Gokul, but the file I have had the same format as I have
>> mentioned. First two columns are not in Json format.
>>
>> Thanks,
>> Raja
>> ------------------------------
>> From: Gokula Krishnan D <email2dgk@gmail.com>
>> Sent: ‎12/‎24/‎2015 2:44 AM
>> To: Eran Witkon <eranwitkon@gmail.com>
>> Cc: raja kbv <rajakbv@yahoo.com>; user@spark.apache.org
>>
>> Subject: Re: How to Parse & flatten JSON object in a text file using
>> Spark &Scala into Dataframe
>>
>> You can try this .. But slightly modified the  input structure since
>> first two columns were not in Json format.
>>
>> [image: Inline image 1]
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>> On Wed, Dec 23, 2015 at 9:46 AM, Eran Witkon <eranwitkon@gmail.com>
>> wrote:
>>
>>> Did you get a solution for this?
>>>
>>> On Tue, 22 Dec 2015 at 20:24 raja kbv <rajakbv@yahoo.com.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am new to spark.
>>>>
>>>> I have a text file with below structure.
>>>>
>>>>
>>>> (employeeID: Int, Name: String, ProjectDetails:
>>>> JsonObject{[{ProjectName, Description, Duriation, Role}]})
>>>> Eg:
>>>> (123456, Employee1, {“ProjectDetails”:[
>>>>                                                          {
>>>> “ProjectName”: “Web Develoement”, “Description” : “Online Sales
website”,
>>>> “Duration” : “6 Months” , “Role” : “Developer”}
>>>>                                                          {
>>>> “ProjectName”: “Spark Develoement”, “Description” : “Online
Sales
>>>> Analysis”, “Duration” : “6 Months” , “Role” : “Data Engineer”}
>>>>                                                          {
>>>> “ProjectName”: “Scala Training”, “Description” : “Training”,
“Duration” :
>>>> “1 Month” }
>>>>                                                           ]
>>>>                                                 }
>>>>
>>>>
>>>> Could someone help me to parse & flatten the record as below dataframe
>>>> using scala?
>>>>
>>>> employeeID,Name, ProjectName, Description, Duration, Role
>>>> 123456, Employee1, Web Develoement, Online Sales website, 6 Months ,
>>>> Developer
>>>> 123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months,
>>>> Data Engineer
>>>> 123456, Employee1, Scala Training, Training, 1 Month, null
>>>>
>>>>
>>>> Thank you in advance.
>>>>
>>>> Regards,
>>>> Raja
>>>>
>>>
>>

Mime
View raw message