I have an hdfs directory with thousands of files. It seems that some of them - and I don't know which ones - have a problem with their schema and it's causing my Spark application to fail with this error:
The problem is not only that it's causing the application to fail, but every time if does fail, I have to copy that file out of the directory and start the app again.
I thought of trying to use try-except, but I can't seem to get that to work.
Is there any advice anyone can give me because I really can't see myself going through thousands of files trying to figure out which ones are broken.
Thanks in advance,