Thanks for the help thus far.
w hen doing a message.show()
Basically I'm reading in a single Parquet file (to try to narrow things down).
I'm defining the schema in the beginning and loading the parquet with:
message = spark\
[I've tried with and without the mergeSchema option]
[ sidenote: I was hoping the badRecordPath would help with the truly bad records, but this seems to do nothing]
I've also tried to cast the potential problematic columns (so Int, Long, Double, etc) with
message_1 = message\
Yet I get this error and I can't figure out:
(a) whether it's some record WITHIN the parquet file that's causing it and
(b) if it is a single record (or a few records) then how do I find those particular records?
In the previous time I encountered this, there were records that should have had doubles in them (like "price" above) that actually seemed to have null.
I did this to fix that particular problem:
if not 'price' in message.columns:
message = message.withColumn('price', message.lit('0'))
Any suggestions or help would be MOST welcome. I have also tried using pyarrow to take a look at the Parquet schema and it looks fine. I mean, it doesn't look like the schema in the parquet is the problem - but of course I'm not ruling that out just yet.
Thanks for any suggestions,