spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishikesh Gawade <rishikeshg1...@gmail.com>
Subject Collecting large dataset
Date Thu, 05 Sep 2019 18:22:45 GMT
Hi.
I have been trying to collect a large dataset(about 2 gb in size, 30
columns, more than a million rows) onto the driver side. I am aware that
collecting such a huge dataset isn't suggested, however, the application
within which the spark driver is running requires that data.
While collecting the dataframe, the spark job throws an error,
TaskResultLost( resultset lost from blockmanager).
I searched for solutions around this and set the following properties:
spark.blockManager.port, maxResultSize to 0(unlimited),
spark.driver.blockManager.port
and the application within which spark driver is running has 28 gb of max
heap size.
And yet the error arises again.
There are 22 executors running in my cluster.
Is there any config/necessary step that i am missing before collecting such
large data?
Or is there any other effective approach that would guarantee collecting
such large data without failure?

Thanks,
Rishikesh

Mime
View raw message