spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: Stage contains task of large size
Date Thu, 03 Mar 2016 22:04:22 GMT
Just to elaborate more on what Silvio wrote below, check whether you are referencing a class
or object member variable in a function literal/closure passed to one of the RDD methods.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Silvio Fiorito [mailto:silvio.fiorito@granturing.com]
Sent: Wednesday, March 2, 2016 8:43 PM
To: Bijuna; user
Subject: RE: Stage contains task of large size




One source of this could be more than you intended (or realized) getting serialized as part
of your operations. What are the transformations you're using? Are you referencing local instance
variables in your driver app, as part of your transformations? You may have a large collection
for instance which you're using in your transformation that will get serialized and sent to
each executor. If you do have something like that look to use broadcast variables instead.





From: Bijuna<mailto:bijuna@gmail.com>
Sent: Wednesday, March 2, 2016 11:20 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Stage contains task of large size


Spark users,

We are running spark application in standalone mode. We see warn messages in the logs which
says

Stage 46 contains a task of very large size (983 KB) . The maximum recommended task size is
100 KB.

What is the recommended approach to fix this warning. Please let me know.

Thank you
Bijuna

Sent from my iPad
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<mailto:user-unsubscribe@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<mailto:user-help@spark.apache.org>

Mime
View raw message