spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <jiangok2...@gmail.com>
Subject schema change for structured spark streaming using jsonl files
Date Mon, 23 Apr 2018 18:46:26 GMT
Hi,

I am using structured spark streaming which reads jsonl files and writes
into parquet files. I am wondering what's the process if jsonl files schema
change.

Suppose jsonl files are generated in \jsonl folder and the old schema is {
"field1": String}. My proposal is:

1. write the jsonl files with new schema (e.g. {"field1":String,
"field2":Int}) into another folder \jsonl2
2. let spark job complete handling all data in \jsonl, then stop the spark
streaming job.
3. use a spark script to convert the parquet files from old schema to new
schema (e.g. add a new column with some default value for "field2").
4. upgrade and start the spark streaming job for handling the new schema
jsonl files and parquet files.

Is this process correct (best)? Thanks for any clue.

Mime
View raw message