spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Jella" <srikanth.je...@gmail.com>
Subject Flattening XML in a DataFrame
Date Sat, 13 Aug 2016 00:33:48 GMT
Hi Folks,

 

I am trying flatten variety of XMLs using DataFrames. I'm using spark-xml
package which is automatically inferring my schema and creating a DataFrame.


 

I do not want to hard code any column names in DataFrame as I have lot of
varieties of XML documents and each might be lot more depth of child nodes.
I simply want to flatten any type of XML and then write output data to a
hive table. Can you please give some expert advice for the same.

 

Example XML and expected output is given below.

 

Sample XML:

<emplist>

<emp>

   <manager>

   <id>1</id>

   <name>foo</name>

    <subordinates>

      <clerk>

        <cid>1</cid>

        <cname>foo</cname>

      </clerk>

      <clerk>

        <cid>1</cid>

        <cname>foo</cname>

      </clerk>

    </subordinates>

   </manager>

</emp>

</emplist>

 

Expected output:

id, name, clerk.cid, clerk.cname

1, foo, 2, cname2

1, foo, 3, cname3

 

Thanks,

Sreekanth Jella

 


Mime
View raw message