spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamish Whittal <>
Subject [No Subject]
Date Sun, 01 Mar 2020 21:56:56 GMT
Hi there,

I have an hdfs directory with thousands of files. It seems that some of
them - and I don't know which ones - have a problem with their schema and
it's causing my Spark application to fail with this error:

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet
column cannot be converted in file hdfs://
Column: [price], Expected: double, Found: FIXED_LEN_BYTE_ARRAY

The problem is not only that it's causing the application to fail, but
every time if does fail, I have to copy that file out of the directory and
start the app again.

I thought of trying to use try-except, but I can't seem to get that to work.

Is there any advice anyone can give me because I really can't see myself
going through thousands of files trying to figure out which ones are broken.

Thanks in advance,


View raw message