spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unk1102 <umesh.ka...@gmail.com>
Subject Best practices to keep multiple version of schema in Spark
Date Mon, 30 Apr 2018 18:48:12 GMT
Hi I have a couple of datasets where schema keep on changing and I store it
as parquet files. Now I use mergeSchema option while loading these different
schema parquet files in a DataFrame and it works all fine. Now I have a
requirement of maintaining difference between schema over time basically
maintaining list of columns which are latest. Please guide if anybody has
done similar work or in general best practices to maintain changes of
columns over time. Thanks in advance.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message