You could create a one-time job that processes historical data to match the updated format

On Tue, Mar 21, 2017 at 8:53 AM, Aditya Borde <bordecorp@gmail.com> wrote:
Hello, 

I'm currently blocked with this issue:

I have job "A" whose output is partitioned by one of the field - "col1"
Now job "B" reads the output of job "A".

Here comes the problem. my job "A" output previously not been partitioned by "col1" (this is recent change).
But the thing is now, all my previous data has not been partitioned by "col1" for job "A".
If I want to run my job "B" without any issue with previous as well as current data - it is failing as because : "inconsistent partition column names"

Reading Path is something like - "file://path1/name/sample/" ---> but further it has directories "day=2017-02-15/filling=5/xyz1"

Currently it is generating one more deeper directory input path --> "/day=2017-02-15/filling=5/col1/xyz2"

"mergeSchema" - is not working here because my base path has multiple directories under which files are residing.

Can someone suggest me some effective solution here?

Regards,
Aditya Borde



--
Regards,

Matt
Data Engineer
https://www.linkedin.com/in/mdeaver
http://mattdeav.pythonanywhere.com/