From user-return-48449-apmail-spark-user-archive=spark.apache.org@spark.apache.org Tue Dec 22 18:10:11 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4507C18EED for ; Tue, 22 Dec 2015 18:10:11 +0000 (UTC) Received: (qmail 15532 invoked by uid 500); 22 Dec 2015 18:10:04 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 15414 invoked by uid 500); 22 Dec 2015 18:10:04 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 15404 invoked by uid 99); 22 Dec 2015 18:10:04 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 18:10:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AD6FC1A01F5 for ; Tue, 22 Dec 2015 18:10:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.344 X-Spam-Level: ** X-Spam-Status: No, score=2.344 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-0.554, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dBFcjmQGHBTd for ; Tue, 22 Dec 2015 18:09:56 +0000 (UTC) Received: from nm48-vm6.bullet.mail.gq1.yahoo.com (nm48-vm6.bullet.mail.gq1.yahoo.com [67.195.87.226]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3780D42B74 for ; Tue, 22 Dec 2015 18:09:56 +0000 (UTC) Received: from [127.0.0.1] by nm48.bullet.mail.gq1.yahoo.com with NNFMP; 22 Dec 2015 18:09:49 -0000 Received: from [216.39.60.183] by nm48.bullet.mail.gq1.yahoo.com with NNFMP; 22 Dec 2015 18:07:05 -0000 Received: from [106.10.166.117] by tm19.bullet.mail.gq1.yahoo.com with NNFMP; 22 Dec 2015 18:07:05 -0000 Received: from [106.10.151.250] by tm6.bullet.mail.sg3.yahoo.com with NNFMP; 22 Dec 2015 18:07:04 -0000 Received: from [127.0.0.1] by omp1021.mail.sg3.yahoo.com with NNFMP; 22 Dec 2015 18:07:04 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 912160.63163.bm@omp1021.mail.sg3.yahoo.com X-YMail-OSG: WyX_PVsVM1lqVLNt_EWw0_RFJy_78mNkD08bR2WvNEBibrHIGoQ9BueeQH0VroP EoXsWOefA7T.o3m2wCWXkWsWOE_QWTeRZbdrKwe_XuF2x.a17_HNwxPdRZD0ym6FGLuFmbaHWEyK uIC0STlgyxwifrz0BGAZgNnkfDA5WEOGyT3GiYzrGDuEAobDRb5YUso5X9Qz2fjvNYm8agkJZa2w SzLCJLvI8zo1OvcOlZptP6DCkz3NQlELdpb5dm0VnFaHQfKUT.uMJbOqIIdqKTWpsMDB8V8CiRM3 pyxM2ne3K57nXNBSSQLl7qpY_DKmnc1WuYKjkM3qd5LBumy.B56eKZ5W3UxoACk2wxp7yXnpOhdD dY8eMmLV_eBSCSTeGrrXO9rANRXGczl6QKRK6fEMrrlnlBhH7M7ZDlJCdWgiVUYNEvzNVSHxV_3R 13KNEk7l6mzW4vqAKmjIScCcEuYpHcS1QvOYiuHTUVN8khfErvRoBJZ1jj7S7en0xISru2AEnQmK cC3AY5NNZExxevqk- Received: by 106.10.196.90; Tue, 22 Dec 2015 18:07:04 +0000 Date: Tue, 22 Dec 2015 18:07:04 +0000 (UTC) From: raja kbv Reply-To: raja kbv To: "user@spark.apache.org" Message-ID: <1270460746.2247662.1450807624112.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <1329044785.2280360.1450807001600.JavaMail.yahoo@mail.yahoo.com> References: <1329044785.2280360.1450807001600.JavaMail.yahoo.ref@mail.yahoo.com> <1329044785.2280360.1450807001600.JavaMail.yahoo@mail.yahoo.com> Subject: How to Parse & flatten JSON object in a text file using Spark & Scala into Dataframe MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2247661_553278963.1450807624107" ------=_Part_2247661_553278963.1450807624107 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I am new to Spark. I have a text filewith below structure. =C2=A0(employeeID: Int, Name: String, ProjectDetails:JsonObject{[{ProjectNa= me, Description, Duriation, Role}]})Eg:(123456, Employee1, {=E2=80=9CProjec= tDetails=E2=80=9D:[=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{=E2=80=9CProjectN= ame=E2=80=9D: =E2=80=9CWeb Develoement=E2=80=9D, =E2=80=9CDescription=E2=80= =9D : =E2=80=9COnline Sales website=E2=80=9D,=E2=80=9CDuration=E2=80=9D : = =E2=80=9C6 Months=E2=80=9D , =E2=80=9CRole=E2=80=9D : =E2=80=9CDeveloper=E2= =80=9D}=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { =E2=80=9CProjectName=E2=80= =9D: =E2=80=9CSpark Develoement=E2=80=9D, =E2=80=9CDescription=E2=80=9D : = =E2=80=9COnline SalesAnalysis=E2=80=9D, =E2=80=9CDuration=E2=80=9D : =E2=80= =9C6 Months=E2=80=9D , =E2=80=9CRole=E2=80=9D : =E2=80=9CData Engineer=E2= =80=9D}=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { =E2=80=9CProjectName=E2=80= =9D: =E2=80=9CScala Training=E2=80=9D, =E2=80=9CDescription=E2=80=9D : =E2= =80=9CTraining=E2=80=9D,=E2=80=9CDuration=E2=80=9D : =E2=80=9C1 Month=E2=80= =9D }=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0]=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}=C2=A0=C2=A0Could someone help me to pars= e & flatten the record asbelow dataframe using scala?=C2=A0employeeID,Name,= ProjectName, Description, Duration, Role123456, Employee1, Web Develoement= , Online Sales website, 6Months , Developer123456, Employee1, Spark Develoe= ment, Online Sales Analysis,6 Months, Data Engineer123456, Employee1, Scala= Training, Training, 1 Month, null=C2=A0 Thank you in advance. Regards,Raja ------=_Part_2247661_553278963.1450807624107 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,=

I am new to Spark.

I have a text file with below structure.


 
(employeeID: Int, Name: String, ProjectDetails: JsonObject{[{ProjectName, Description, Duriation, Role}]})
Eg:
(123456, Employee1, {=E2=80=9CProjectDetails=E2=80=9D:[
        &nb= sp;            =             &nb= sp;            =            { =E2=80=9CProjectName=E2=80=9D: =E2=80=9CWeb Develoement=E2=80=9D, =E2=80=9C= Description=E2=80=9D : =E2=80=9COnline Sales website=E2=80=9D, =E2=80=9CDuration=E2=80=9D : =E2=80=9C6 Months=E2=80=9D , =E2=80=9CRole=E2= =80=9D : =E2=80=9CDeveloper=E2=80=9D}
        &nb= sp;            =             &nb= sp;            =            { =E2= =80=9CProjectName=E2=80=9D: =E2=80=9CSpark Develoement=E2=80=9D, =E2=80=9CD= escription=E2=80=9D : =E2=80=9COnline Sales Analysis=E2=80=9D, =E2=80=9CDuration=E2=80=9D : =E2=80=9C6 Months=E2=80=9D = , =E2=80=9CRole=E2=80=9D : =E2=80=9CData Engineer=E2=80=9D}
        &nb= sp;            =             &nb= sp;            =            { =E2= =80=9CProjectName=E2=80=9D: =E2=80=9CScala Training=E2=80=9D, =E2=80=9CDesc= ription=E2=80=9D : =E2=80=9CTraining=E2=80=9D, =E2=80=9CDuration=E2=80=9D : =E2=80=9C1 Month=E2=80=9D }
        &nb= sp;            =             &nb= sp;            =             ]
        &nb= sp;            =             &nb= sp;            =   }
 
 
Could someone help me to parse & flatten the record as below dataframe using scala?
 
employeeID,Name, ProjectName, Description, Duration, Role
123456, Employee1, Web Develoement, Online Sales website, 6 Months , Developer
123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months, Data Engineer
123456, Employee1, Scala Training, Training, 1 Month, null
 

Thank you in advance.
<= br>
Regards,
Raja
------=_Part_2247661_553278963.1450807624107--