spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: Quirk in how Spark DF handles JSON input records?
Date Wed, 02 Nov 2016 19:39:01 GMT

On Nov 2, 2016, at 2:22 PM, Daniel Siegmann <dsiegmann@securityscorecard.io<mailto:dsiegmann@securityscorecard.io>>
wrote:

Yes, it needs to be on a single line. Spark (or Hadoop really) treats newlines as a record
separator by default. While it is possible to use a different string as a record separator,
what would you use in the case of JSON?

If you do some Googling I suspect you'll find some possible solutions. Personally, I would
just use a separate JSON library (e.g. json4s) to parse this metadata into an object, rather
than trying to read it in through Spark.


Yeah, that’s the basic idea.

This JSON is metadata to help drive the process not row records… although the column descriptors
are row records so in the short term I could cheat and just store those in a file.

:-(

--
Daniel Siegmann
Senior Software Engineer
SecurityScorecard Inc.
214 W 29th Street, 5th Floor
New York, NY 10001

Mime
View raw message