From user-return-27959-apmail-spark-user-archive=spark.apache.org@spark.apache.org Tue Mar 3 16:57:15 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 864DF17DD1 for ; Tue, 3 Mar 2015 16:57:15 +0000 (UTC) Received: (qmail 93077 invoked by uid 500); 3 Mar 2015 16:57:12 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 93008 invoked by uid 500); 3 Mar 2015 16:57:12 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 92998 invoked by uid 99); 3 Mar 2015 16:57:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 16:57:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tavoaqp@gmail.com designates 209.85.216.49 as permitted sender) Received: from [209.85.216.49] (HELO mail-qa0-f49.google.com) (209.85.216.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 16:57:08 +0000 Received: by mail-qa0-f49.google.com with SMTP id w8so28875796qac.8 for ; Tue, 03 Mar 2015 08:55:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=fN6Hdhjv3LH644tPmvtPOFbG0qtblBzeP4fDV0gga5g=; b=A4gJMsE53ujX9u27EOp2BfLGmNXK7gLKY74U8yYy5wjongBz13HclYUHvdUdk7Jcdp GEheWXVmvMCMFXOLvAA3bQJ1YE6o/Y3ehzeZTiv9dP3j2dc0QMgtlUlse2VzvqOsq594 oHFCyu/FiuIf6bzvfMK/NRgSBxdKem03vT2ZW+pnr40HmKKa+RRNr07EhbYrbqFs5go/ XVnFCZLow54XIj3i8lnlmCh7ZxuFkWZKPWgXKNJKbZeAiYODAKtYGYJdOvV+Nk//teP2 mYUKg/7db4j/V93OpyYv6dWcpOLrH6JeJYUdR3AEOlTz5yaNVrnLnvhYM9lTNacT7yhL y+DQ== MIME-Version: 1.0 X-Received: by 10.140.43.199 with SMTP id e65mr58424316qga.34.1425401717561; Tue, 03 Mar 2015 08:55:17 -0800 (PST) Reply-To: gsalazar@ime.usp.br Sender: tavoaqp@gmail.com Received: by 10.96.48.71 with HTTP; Tue, 3 Mar 2015 08:55:17 -0800 (PST) In-Reply-To: References: Date: Tue, 3 Mar 2015 13:55:17 -0300 X-Google-Sender-Auth: MhHgn3V-NW0ttxox5_RCGH6MTYM Message-ID: Subject: Re: LBGFS optimizer performace From: Gustavo Enrique Salazar Torres To: Akhil Das Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=001a113a9baa09a2a305106533a9 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a9baa09a2a305106533a9 Content-Type: text/plain; charset=UTF-8 Just did with the same error. I think the problem is the "data.count()" call in LBFGS because for huge datasets that's naive to do. I was thinking to write my version of LBFGS but instead of doing data.count() I will pass that parameter which I will calculate from a Spark SQL query. I will let you know. Thanks On Tue, Mar 3, 2015 at 3:25 AM, Akhil Das wrote: > Can you try increasing your driver memory, reducing the executors and > increasing the executor memory? > > Thanks > Best Regards > > On Tue, Mar 3, 2015 at 10:09 AM, Gustavo Enrique Salazar Torres < > gsalazar@ime.usp.br> wrote: > >> Hi there: >> >> I'm using LBFGS optimizer to train a logistic regression model. The code >> I implemented follows the pattern showed in >> https://spark.apache.org/docs/1.2.0/mllib-linear-methods.html but >> training data is obtained from a Spark SQL RDD. >> The problem I'm having is that LBFGS tries to count the elements in my >> RDD and that results in a OOM exception since my dataset is huge. >> I'm running on a AWS EMR cluster with 16 c3.2xlarge instances on Hadoop >> YARN. My dataset is about 150 GB but I sample (I take only 1% of the data) >> it in order to scale logistic regression. >> The exception I'm getting is this: >> >> 15/03/03 04:21:44 WARN scheduler.TaskSetManager: Lost task 108.0 in stage >> 2.0 (TID 7600, ip-10-155-20-71.ec2.internal): java.lang.OutOfMemoryError: >> Java heap space >> at java.util.Arrays.copyOfRange(Arrays.java:2694) >> at java.lang.String.(String.java:203) >> at com.esotericsoftware.kryo.io.Input.readString(Input.java:448) >> at >> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157) >> at >> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146) >> at >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) >> at >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) >> at >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) >> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) >> at >> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) >> at >> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) >> at >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) >> at >> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) >> at >> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33) >> at >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) >> at >> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) >> at >> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) >> at >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) >> at >> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) >> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >> at >> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) >> at >> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at org.apache.spark.sql.execution.joins.HashOuterJoin.org >> $apache$spark$sql$execution$joins$HashOuterJoin$$buildHashTable(HashOuterJoin.scala:179) >> at >> org.apache.spark.sql.execution.joins.HashOuterJoin$$anonfun$execute$1.apply(HashOuterJoin.scala:199) >> at >> org.apache.spark.sql.execution.joins.HashOuterJoin$$anonfun$execute$1.apply(HashOuterJoin.scala:196) >> at >> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> >> I'm using this parameters at runtime: >> --num-executors 128 --executor-memory 1G --driver-memory 4G >> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer >> --conf spark.storage.memoryFraction=0.2 >> >> I also persist my dataset using MEMORY_AND_DISK_SER but get the same >> error. >> I will appreciate any help on this problem. I have been trying to solve >> it for days and I'm running out of time and hair. >> >> Thanks >> Gustavo >> > > --001a113a9baa09a2a305106533a9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Just did with the same error.=C2=A0
I think the proble= m is the "data.count()" call in LBFGS because for huge datasets t= hat's naive to do.
I was thinking to write my version of LBFGS but = instead of doing data.count() I will pass that parameter which I will calcu= late from a Spark SQL query.

I will let you know.<= /div>

Thanks


On Tue, Mar 3, 2015 at 3:25= AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:<= br>
Can you try = increasing your driver memory, reducing the executors and increasing the ex= ecutor memory?

Thanks
Best Regards
=

On Tue, Mar 3, 2015 at 10:09 AM, Gustavo Enr= ique Salazar Torres <gsalazar@ime.usp.br> wrote:
Hi there:

I= 9;m using LBFGS optimizer to train a logistic regression model. The code I = implemented follows the pattern showed in=C2=A0https://spa= rk.apache.org/docs/1.2.0/mllib-linear-methods.html but training data is= obtained from a Spark SQL RDD.
The problem I'm having is tha= t LBFGS tries to count the elements in my RDD and that results in a OOM exc= eption since my dataset is huge.
I'm running on a AWS EMR clu= ster with 16 c3.2xlarge instances on Hadoop YARN. My dataset is about 150 G= B but I sample (I take only 1% of the data) it in order to scale logistic r= egression.
The exception I'm getting is this:

<= /div>
15/03/03 04:21:44 WARN scheduler.TaskSetManager: Lost task 1= 08.0 in stage 2.0 (TID 7600, ip-10-155-20-71.ec2.internal): java.lang.OutOf= MemoryError: Java heap space
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.= util.Arrays.copyOfRange(Arrays.java:2694)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.lang.String.<init>(String.java:203)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.io.Input.readString(Input= .java:448)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kr= yo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.= java:157)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kry= o.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.j= ava:146)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo= .Kryo.readClassAndObject(Kryo.java:732)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$Object= ArraySerializer.read(DefaultArraySerializers.java:338)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.serializers.DefaultArraySeri= alizers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.Kryo.readObjec= t(Kryo.java:651)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftw= are.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:= 605)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.ser= ializers.FieldSerializer.read(FieldSerializer.java:221)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.Kryo.readClassAndObject(K= ryo.java:732)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.twitter.chill.Tu= ple2Serializer.read(TupleSerializers.scala:42)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scal= a:33)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.Kr= yo.readClassAndObject(Kryo.java:732)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSer= ializer.scala:144)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spar= k.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.util.NextIterator.h= asNext(NextIterator.scala:71)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.= apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$$anon$13.= hasNext(Iterator.scala:371)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.ap= ache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.InterruptibleIterato= r.hasNext(InterruptibleIterator.scala:39)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$$anon$11.h= asNext(Iterator.scala:327)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sql.execution.joins.HashOuterJoin.org$apac= he$spark$sql$execution$joins$HashOuterJoin$$buildHashTable(HashOuterJoin.sc= ala:179)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sql.exec= ution.joins.HashOuterJoin$$anonfun$execute$1.apply(HashOuterJoin.scala:199)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sql.execution.jo= ins.HashOuterJoin$$anonfun$execute$1.apply(HashOuterJoin.scala:196)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.rdd.ZippedPartitionsRDD2= .compute(ZippedPartitionsRDD.scala:88)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.rdd.RDD.iterator(RDD.= scala:247)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.rdd.Ma= pPartitionsRDD.compute(MapPartitionsRDD.scala:35)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala= :280)

I'm using this parameters at runti= me:
--num-executors 128 --executor-memory 1G --driver-memory 4G= =C2=A0
--conf spark.serializer=3Dorg.apache.spark.serializer.Kryo= Serializer=C2=A0
--conf spark.storage.memoryFraction=3D0.2=C2=A0<= br>

I also persist my dataset using=C2=A0MEMORY_AN= D_DISK_SER but get the same error.
I will appreciate any help on = this problem. I have been trying to solve it for days and I'm running o= ut of time and hair.

Thanks
Gustavo


--001a113a9baa09a2a305106533a9--