spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: fail to run LBFS in 5G KDD data in spark 1.0.1?
Date Wed, 06 Aug 2014 15:11:56 GMT
Do you mind testing 1.1-SNAPSHOT and allocating more memory to the driver?
I think the problem is with the feature dimension. KDD data has more than
20M features and in v1.0.1, the driver collects the partial gradients one
by one, sums them up, does the update, and then sends the new weights back
to executors one by one. In 1.1-SNAPSHOT, we switched to multi-level tree
aggregation and torrent broadcasting.

For the driver memory, you can set it with spark-summit using
`--driver-memory 30g`. It could be confirmed by visiting the storage tab in
the WebUI.

-Xiangrui


On Wed, Aug 6, 2014 at 1:58 AM, Lizhengbing (bing, BIPA) <
zhengbing.li@huawei.com> wrote:

>  1 I don’t use spark_submit to run my problem and use spark context
> directly
>
> val conf = new SparkConf()
>              .setMaster("spark://123d101suse11sp3:7077")
>              .setAppName("LBFGS")
>              .set("spark.executor.memory", "30g")
>              .set("spark.akka.frameSize","20")
> val sc = new SparkContext(conf)
>
>
>
> 2 I use KDD data, size is about 5G
>
>
>
> 3 After I execute LBFGS.runLBFGS, at the stage of 7, the problem occus:
>
>
>
>
>
> 14/08/06 16:44:45 INFO DAGScheduler: Failed to run aggregate at
> LBFGS.scala:201
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 7.0:12 failed 4 times, most recent failure: TID
> 304 on host 123d103suse11sp3 failed for unknown reason
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>

Mime
View raw message