spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: StandardScaler failing with OOM errors in PySpark
Date Mon, 18 May 2015 06:49:23 GMT
AFAIK, there are two places where you can specify the driver memory.
One is via spark-summit --driver-memory and the other is via
spark.driver.memory in spark-defaults.conf. Please try these
approaches and see whether they work or not. You can find detailed
instructions at http://spark.apache.org/docs/latest/configuration.html
and http://spark.apache.org/docs/latest/submitting-applications.html.
-Xiangrui

On Tue, Apr 28, 2015 at 4:06 AM, Rok Roskar <rokroskar@gmail.com> wrote:
> That's exactly what I'm saying -- I specify the memory options using spark
> options, but this is not reflected in how the JVM is created. No matter
> which memory settings I specify, the JVM for the driver is always made with
> 512Mb of memory. So I'm not sure if this is a feature or a bug?
>
> rok
>
> On Mon, Apr 27, 2015 at 6:54 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>
>> You might need to specify driver memory in spark-submit instead of
>> passing JVM options. spark-submit is designed to handle different
>> deployments correctly. -Xiangrui
>>
>> On Thu, Apr 23, 2015 at 4:58 AM, Rok Roskar <rokroskar@gmail.com> wrote:
>> > ok yes, I think I have narrowed it down to being a problem with driver
>> > memory settings. It looks like the application master/driver is not
>> > being
>> > launched with the settings specified:
>> >
>> > For the driver process on the main node I see "-XX:MaxPermSize=128m
>> > -Xms512m
>> > -Xmx512m" as options used to start the JVM, even though I specified
>> >
>> > 'spark.yarn.am.memory', '5g'
>> > 'spark.yarn.am.memoryOverhead', '2000'
>> >
>> > The info shows that these options were read:
>> >
>> > 15/04/23 13:47:47 INFO yarn.Client: Will allocate AM container, with
>> > 7120 MB
>> > memory including 2000 MB overhead
>> >
>> > Is there some reason why these options are being ignored and instead
>> > starting the driver with just 512Mb of heap?
>> >
>> > On Thu, Apr 23, 2015 at 8:06 AM, Rok Roskar <rokroskar@gmail.com> wrote:
>> >>
>> >> the feature dimension is 800k.
>> >>
>> >> yes, I believe the driver memory is likely the problem since it doesn't
>> >> crash until the very last part of the tree aggregation.
>> >>
>> >> I'm running it via pyspark through YARN -- I have to run in client mode
>> >> so
>> >> I can't set spark.driver.memory -- I've tried setting the
>> >> spark.yarn.am.memory and overhead parameters but it doesn't seem to
>> >> have an
>> >> effect.
>> >>
>> >> Thanks,
>> >>
>> >> Rok
>> >>
>> >> On Apr 23, 2015, at 7:47 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
>> >>
>> >> > What is the feature dimension? Did you set the driver memory?
>> >> > -Xiangrui
>> >> >
>> >> > On Tue, Apr 21, 2015 at 6:59 AM, rok <rokroskar@gmail.com> wrote:
>> >> >> I'm trying to use the StandardScaler in pyspark on a relatively
>> >> >> small
>> >> >> (a few
>> >> >> hundred Mb) dataset of sparse vectors with 800k features. The fit
>> >> >> method of
>> >> >> StandardScaler crashes with Java heap space or Direct buffer memory
>> >> >> errors.
>> >> >> There should be plenty of memory around -- 10 executors with 2
cores
>> >> >> each
>> >> >> and 8 Gb per core. I'm giving the executors 9g of memory and have
>> >> >> also
>> >> >> tried
>> >> >> lots of overhead (3g), thinking it might be the array creation
in
>> >> >> the
>> >> >> aggregators that's causing issues.
>> >> >>
>> >> >> The bizarre thing is that this isn't always reproducible --
>> >> >> sometimes
>> >> >> it
>> >> >> actually works without problems. Should I be setting up executors
>> >> >> differently?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Rok
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/StandardScaler-failing-with-OOM-errors-in-PySpark-tp22593.html
>> >> >> Sent from the Apache Spark User List mailing list archive at
>> >> >> Nabble.com.
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> >> For additional commands, e-mail: user-help@spark.apache.org
>> >> >>
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message