Hi Folks,


I am trying flatten variety of XMLs using DataFrames. I’m using spark-xml package which is automatically inferring my schema and creating a DataFrame.


I do not want to hard code any column names in DataFrame as I have lot of varieties of XML documents and each might be lot more depth of child nodes. I simply want to flatten any type of XML and then write output data to a hive table. Can you please give some expert advice for the same.


Example XML and expected output is given below.


Sample XML:




















Expected output:

id, name, clerk.cid, clerk.cname

1, foo, 2, cname2

1, foo, 3, cname3



Sreekanth Jella