spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <>
Subject Re: Best way to read XML data from RDD
Date Sat, 20 Aug 2016 05:05:42 GMT
Ah. Have you tried Jackson?

From: Diwakar Dhanuskodi <<>>
Sent: Friday, August 19, 2016 9:41 PM
Subject: Re: Best way to read XML data from RDD
To: Felix Cheung <<>>,
user <<>>

Yes . It accepts a xml file as source but not RDD. The XML data embedded  inside json is streamed
from kafka cluster.  So I could get it as RDD.
Right  now  I am using  spark.xml  XML.loadstring method inside  RDD map function  but  performance
 wise I am not happy as it takes 4 minutes to parse XML from 2 million messages in a 3 nodes
100G 4 cpu each environment.

Sent from Samsung Mobile.

-------- Original message --------
From: Felix Cheung <<>>
Date:20/08/2016 09:49 (GMT+05:30)
To: Diwakar Dhanuskodi <<>>,
user <<>>
Subject: Re: Best way to read XML data from RDD

Have you tried

On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi"<<>>


There is a RDD with json data. I could read json data using . The json data
has XML data in couple of key-value paris.

Which is the best method to read and parse XML from rdd. Is there any specific xml libraries
for spark. Could anyone help on this.


View raw message