spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: structured streaming handling validation and json flattening
Date Mon, 11 Feb 2019 14:59:38 GMT
Hi Lian,

"What have you tried?" would be a good starting point. Any help on this?

How do you read the JSONs? readStream.json? You could use readStream.text
followed by filter to include/exclude good/bad JSONs.

Jacek Laskowski
Mastering Spark SQL
Spark Structured Streaming
Mastering Kafka Streams
Follow me at

On Sat, Feb 9, 2019 at 8:25 PM Lian Jiang <> wrote:

> Hi,
> We have a structured streaming job that converting json into parquets. We
> want to validate the json records. If a json record is not valid, we want
> to log a message and refuse to write it into the parquet. Also the json has
> nesting jsons and we want to flatten the nesting jsons into other parquets
> by using the same streaming job. My questions are:
> 1. how to validate the json records in a structured streaming job?
> 2. how to flattening the nesting jsons in a structured streaming job?
> 3. is it possible to use one structured streaming job to validate json,
> convert json into a parquet and convert nesting jsons into other parquets?
> I think unstructured streaming can achieve these goals but structured
> streaming is recommended by spark community.
> Appreciate your feedback!

View raw message