spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: Parsing a large XML file using Spark
Date Wed, 19 Nov 2014 01:51:32 GMT
Hi,

see https://www.mail-archive.com/dev@spark.apache.org/msg03520.html for one
solution.

One issue with those XML files is that they cannot be processed line by
line in parallel; plus you inherently need shared/global state to parse XML
or check for well-formedness, I think. (Same issue with multi-line JSON, by
the way.)

Tobias

Mime
View raw message