Hi folks,

Thanks for the help thus far.

I'm trying to track down the source of this error:
java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary

w hen doing a message.show()

Basically I'm reading in a single Parquet file (to try to narrow things down).

I'm defining the schema in the beginning and loading the parquet with:
   message = spark\
             .option("mergeSchema", "true")\
             .option("badRecordsPath", "/tmp/badRecords/")\

[I've tried with and without the mergeSchema option]
[ sidenote: I was hoping the badRecordPath would help with the truly bad records, but this seems to do nothing]

I've also tried to cast the potential problematic columns (so Int, Long, Double, etc) with

  message_1 = message\
    .withColumn('price', col('price').cast('double'))\
    .withColumn('price_eur', col('price_eur').cast('double'))\
    .withColumn('cost_usd', col('cost_usd').cast('double'))\
    .withColumn('adapter_status', col('adapter_status').cast('long'))

Yet I get this error and I can't figure out:
(a) whether it's some record WITHIN the parquet file that's causing it and
(b) if it is a single record (or a few records) then how do I find those particular records?

In the previous time I encountered this, there were records that should have had doubles in them (like "price" above) that actually seemed to have null.

I did this to fix that particular problem:

if not 'price' in message.columns:
    message = message.withColumn('price', message.lit('0'))

Any suggestions or help would be MOST welcome. I have also tried using pyarrow to take a look at the Parquet schema and it looks fine. I mean, it doesn't look like the schema in the parquet is the problem - but of course I'm not ruling that out just yet.

Thanks for any suggestions,

Cape Town, South Africa
+27 79 614 4913