spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yong Zhang <java8...@hotmail.com>
Subject Re: Dataset : Issue with Save
Date Fri, 17 Mar 2017 02:47:14 GMT
Did you read the JIRA ticket? Are you confirming that it is fixed in Spark 2.0, or you complain
that it still exists in Spark 2.0?


First, you didn't tell us what version of your Spark you are using. The JIRA clearly said
that it is a bug in Spark 1.x, and should be fixed in Spark 2.0. So help yourself and the
community, to confirm if this is the case.


If you are looking for workaround, the JIRA ticket clearly show you how to increase your driver
heap. 1G in today's world really is kind of small.


Yong


________________________________
From: Bahubali Jain <bahubali@gmail.com>
Sent: Thursday, March 16, 2017 10:34 PM
To: Yong Zhang
Cc: user@spark.apache.org
Subject: Re: Dataset : Issue with Save

Hi,
Was this not yet resolved?
Its a very common requirement to save a dataframe, is there a better way to save a dataframe
by avoiding data being sent to driver?.

"Total size of serialized results of 3722 tasks (1024.0 MB) is bigger than spark.driver.maxResultSize
(1024.0 MB) "

Thanks,
Baahu

On Fri, Mar 17, 2017 at 1:19 AM, Yong Zhang <java8964@hotmail.com<mailto:java8964@hotmail.com>>
wrote:

You can take a look of https://issues.apache.org/jira/browse/SPARK-12837


Yong

Spark driver requires large memory space for serialized ...<https://issues.apache.org/jira/browse/SPARK-12837>
issues.apache.org<http://issues.apache.org>
Executing a sql statement with a large number of partitions requires a high memory space for
the driver even there are no requests to collect data back to the driver.




________________________________
From: Bahubali Jain <bahubali@gmail.com<mailto:bahubali@gmail.com>>
Sent: Thursday, March 16, 2017 1:39 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Dataset : Issue with Save

Hi,
While saving a dataset using        mydataset.write().csv("outputlocation")              
    I am running into an exception

"Total size of serialized results of 3722 tasks (1024.0 MB) is bigger than spark.driver.maxResultSize
(1024.0 MB)"

Does it mean that for saving a dataset whole of the dataset contents are being sent to driver
,similar to collect()  action?

Thanks,
Baahu



--
Twitter:http://twitter.com/Baahu


Mime
View raw message