spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <jiangok2...@gmail.com>
Subject Re: schema change for structured spark streaming using jsonl files
Date Tue, 24 Apr 2018 16:24:48 GMT
Thanks for any help!

On Mon, Apr 23, 2018 at 11:46 AM, Lian Jiang <jiangok2006@gmail.com> wrote:

> Hi,
>
> I am using structured spark streaming which reads jsonl files and writes
> into parquet files. I am wondering what's the process if jsonl files schema
> change.
>
> Suppose jsonl files are generated in \jsonl folder and the old schema is {
> "field1": String}. My proposal is:
>
> 1. write the jsonl files with new schema (e.g. {"field1":String,
> "field2":Int}) into another folder \jsonl2
> 2. let spark job complete handling all data in \jsonl, then stop the spark
> streaming job.
> 3. use a spark script to convert the parquet files from old schema to new
> schema (e.g. add a new column with some default value for "field2").
> 4. upgrade and start the spark streaming job for handling the new schema
> jsonl files and parquet files.
>
> Is this process correct (best)? Thanks for any clue.
>

Mime
View raw message