spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: structured streaming handling validation and json flattening
Date Mon, 11 Feb 2019 14:59:38 GMT
Hi Lian,

"What have you tried?" would be a good starting point. Any help on this?

How do you read the JSONs? readStream.json? You could use readStream.text
followed by filter to include/exclude good/bad JSONs.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Sat, Feb 9, 2019 at 8:25 PM Lian Jiang <jiangok2006@gmail.com> wrote:

> Hi,
>
> We have a structured streaming job that converting json into parquets. We
> want to validate the json records. If a json record is not valid, we want
> to log a message and refuse to write it into the parquet. Also the json has
> nesting jsons and we want to flatten the nesting jsons into other parquets
> by using the same streaming job. My questions are:
>
> 1. how to validate the json records in a structured streaming job?
> 2. how to flattening the nesting jsons in a structured streaming job?
> 3. is it possible to use one structured streaming job to validate json,
> convert json into a parquet and convert nesting jsons into other parquets?
>
> I think unstructured streaming can achieve these goals but structured
> streaming is recommended by spark community.
>
> Appreciate your feedback!
>

Mime
View raw message