Ah. Have you tried Jackson?
https://github.com/FasterXML/jackson-dataformat-xml/blob/master/README.md


_____________________________
From: Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com>
Sent: Friday, August 19, 2016 9:41 PM
Subject: Re: Best way to read XML data from RDD
To: Felix Cheung <felixcheung_m@hotmail.com>, user <user@spark.apache.org>


Yes . It accepts a xml file as source but not RDD. The XML data embedded  inside json is streamed from kafka cluster.  So I could get it as RDD. 
Right  now  I am using  spark.xml  XML.loadstring method inside  RDD map function  but  performance  wise I am not happy as it takes 4 minutes to parse XML from 2 million messages in a 3 nodes 100G 4 cpu each environment. 


Sent from Samsung Mobile.


-------- Original message --------
From: Felix Cheung <felixcheung_m@hotmail.com>
Date:20/08/2016 09:49 (GMT+05:30)
To: Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com>, user <user@spark.apache.org>
Cc:
Subject: Re: Best way to read XML data from RDD

Have you tried

https://github.com/databricks/spark-xml
?




On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi"<diwakar.dhanuskodi@gmail.com> wrote:

Hi, 

There is a RDD with json data. I could read json data using rdd.read.json . The json data has XML data in couple of key-value paris. 

Which is the best method to read and parse XML from rdd. Is there any specific xml libraries for spark. Could anyone help on this.

Thanks.