spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Bridgett <adr...@opensignal.com>
Subject Re: very high maxresults setting (no collect())
Date Thu, 22 Sep 2016 07:53:38 GMT
Hi Michael,

No spark upgrade, we've been changing some of our data pipelines so the 
data volumes have probably been getting a bit larger.  Just in the last 
few weeks we've seen quite a few jobs needing a larger maxResultSize. 
Some jobs have gone from "fine with 1GB default" to 3GB.   Wondering 
what besides a collect could cause this (as there's certainly not an 
explicit collect()).

Mesos, parquet source data, a broadcast of a small table earlier which 
is joined then just a few aggregations, select, coalesce and spark-csv 
write.  The executors go along nicely (as does the driver) and then we 
start to hit memory pressure on the driver in the output loop and the 
job grinds to a crawl (we eventually have to kill it and restart with 
more memory).

Adrian

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message