Ah. Have you tried Jackson?

From: Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com>
Sent: Friday, August 19, 2016 9:41 PM
Subject: Re: Best way to read XML data from RDD
To: Felix Cheung <felixcheung_m@hotmail.com>, user <user@spark.apache.org>

Yes . It accepts a xml file as source but not RDD. The XML data embedded  inside json is streamed from kafka cluster.  So I could get it as RDD. 
Right  now  I am using  spark.xml  XML.loadstring method inside  RDD map function  but  performance  wise I am not happy as it takes 4 minutes to parse XML from 2 million messages in a 3 nodes 100G 4 cpu each environment. 

Sent from Samsung Mobile.

-------- Original message --------
From: Felix Cheung <felixcheung_m@hotmail.com>
Date:20/08/2016 09:49 (GMT+05:30)
To: Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com>, user <user@spark.apache.org>
Subject: Re: Best way to read XML data from RDD

Have you tried


On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi"<diwakar.dhanuskodi@gmail.com> wrote:


There is a RDD with json data. I could read json data using rdd.read.json . The json data has XML data in couple of key-value paris. 

Which is the best method to read and parse XML from rdd. Is there any specific xml libraries for spark. Could anyone help on this.