spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Jella" <>
Subject Flattening XML in a DataFrame
Date Sat, 13 Aug 2016 00:33:48 GMT
Hi Folks,


I am trying flatten variety of XMLs using DataFrames. I'm using spark-xml
package which is automatically inferring my schema and creating a DataFrame.


I do not want to hard code any column names in DataFrame as I have lot of
varieties of XML documents and each might be lot more depth of child nodes.
I simply want to flatten any type of XML and then write output data to a
hive table. Can you please give some expert advice for the same.


Example XML and expected output is given below.


Sample XML:




















Expected output:

id, name, clerk.cid, clerk.cname

1, foo, 2, cname2

1, foo, 3, cname3



Sreekanth Jella


View raw message