Greetings everyone,

I'm trying to read a single field of a Hive table stored as Parquet in Spark (~140GB for the entire table, this single field should be just a few GB) and look at the sorted output using the following:

sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC"
​But this simple line of code gives:

Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes

Same error for:
sql("SELECT " + field + " FROM MY_TABLE).sort(field)
and:
sql("SELECT " + field + " FROM MY_TABLE).orderBy(field)

I'm running this on a machine with more than 200GB of RAM, running in local mode with spark.driver.memory set to 64g.

I do not know why it cannot allocate a big enough page, and why is it trying to allocate such a big page in the first place?

I hope someone with more knowledge of Spark can shed some light on this. Thank you!

​Best regards,​

Babak Alipour ,
University of Florida