spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <>
Subject Re: Parsing a large XML file using Spark
Date Wed, 19 Nov 2014 01:51:32 GMT

see for one

One issue with those XML files is that they cannot be processed line by
line in parallel; plus you inherently need shared/global state to parse XML
or check for well-formedness, I think. (Same issue with multi-line JSON, by
the way.)


View raw message