spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Borde <bordec...@gmail.com>
Subject Merging Schema while reading Parquet files
Date Tue, 21 Mar 2017 14:53:12 GMT
Hello,

I'm currently blocked with this issue:

I have job "A" whose output is partitioned by one of the field - "col1"
Now job "B" reads the output of job "A".

Here comes the problem. my job "A" output previously not been partitioned
by "col1" (this is recent change).
But the thing is now, all my previous data has not been partitioned by
"col1" for job "A".
If I want to run my job "B" without any issue with previous as well as
current data - it is failing as because : "inconsistent partition column
names"

*Reading Path is something like - "file://path1/name/sample/"* ---> but
further it has directories *"day=2017-02-15/filling=5/xyz1"*

Currently it is generating one more deeper directory input path --> "
*/day=2017-02-15/filling=5/col1/xyz2"*

"mergeSchema" - is not working here because my base path has multiple
directories under which files are residing.

Can someone suggest me some effective solution here?

Regards,
Aditya Borde

Mime
View raw message