systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janardhan Pulivarthi <janardhan.pulivar...@gmail.com>
Subject Re: Error while Executing code inSystemML
Date Mon, 17 Jul 2017 06:37:16 GMT
Hi Arijit,

BTW, as you cannot share the code. You may find these issues helpful to
avoid this problem.

Even we have this problem, yet to be resolved.
https://issues.apache.org/jira/browse/SYSTEMML-831
Possible verdict. https://issues.apache.org/jira/browse/SPARK-6235

Have closer look at this comment
<https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147>.
May be you can try manipulating the drive configurations and/or batch
sizes.

Cheers, Janardhan

On Sun, Jul 16, 2017 at 8:51 PM, arijit chakraborty <akc14@hotmail.com>
wrote:

> Hi Janardhan,
>
>
> Thanks for your reply. I, for the time being, can't share the actual code.
> It's still work in progress. But our datasize is 28 MB and it has 100
> continuous variable and 1 column with numeric label variable.
>
>
> But thanks for guiding us that it's spark issue, rather than systemML
> issue.
>
>
> Thank you! Regards,
>
> Arijit
>
> ________________________________
> From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
> Sent: Sunday, July 16, 2017 10:15:12 AM
> To: dev@systemml.apache.org
> Subject: Re: Error while Executing code inSystemML
>
> Hi Arijit,
>
> Can you please send the exact code (the .dml file), you have used and the
> dataset details and sizes?. This is problem has something to do with the
> Apache Spark.
>
> Thanks, Janardhan
>
> On Sat, Jul 15, 2017 at 3:46 PM, arijit chakraborty <akc14@hotmail.com>
> wrote:
>
> > Hi,
> >
> >
> > I'm suddenly getting this error while running the code in systemML. For
> > smaller number of data points it running fine. But when I'm increasing
> the
> > data point, it's throwing this error.I'm using system with 244 gb ram 32
> > cores and 100 gb hard disk space and putting pyspark configurations in
> the
> > notebook only.
> >
> >
> > 17/07/14 21:49:25 WARN TaskSetManager: Stage 394647 contains a task of
> > very large size (558 KB). The maximum recommended
> > task size is 100 KB.
> > 17/07/14 21:54:18 ERROR ContextCleaner: Error cleaning broadcast 431882
> > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
> > seconds]. This timeout is controlled by spark.rpc
> > .askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
> >         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:83)
> >         at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(
> > BlockManagerMaster.scala:151)
> >         at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(
> > TorrentBroadcast.scala:299)
> >         at org.apache.spark.broadcast.TorrentBroadcastFactory.
> unbroadcast(
> > TorrentBroadcastFactory.scala:45)
> >         at org.apache.spark.broadcast.BroadcastManager.unbroadcast(
> > BroadcastManager.scala:60)
> >         at org.apache.spark.ContextCleaner.doCleanupBroadcast(
> > ContextCleaner.scala:232)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:188)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:179)
> >         at scala.Option.foreach(Option.scala:257)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(Context
> > Cleaner.scala:179)
> >         at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.
> > scala:1245)
> >         at org.apache.spark.ContextCleaner.org$apache$
> > spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172)
> >         at org.apache.spark.ContextCleaner$$anon$1.run(
> > ContextCleaner.scala:67)
> > Caused by: java.util.concurrent.TimeoutException: Futures timed out
> after
> > [120 seconds]
> >         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
> > scala:219)
> >         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
> > scala:223)
> >         at scala.concurrent.Await$$anonfun$result$1.apply(
> > package.scala:190)
> >         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> > BlockContext.scala:53)
> >         at scala.concurrent.Await$.result(package.scala:190)
> >         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:81)
> >         ... 12 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 431882 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> >         at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> >         at scala.util.Try$.apply(Try.scala:192)
> >         at scala.util.Failure.recover(Try.scala:216)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> >         at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.complete(Promise.scala:55)
> >         at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> >         at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> >         at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> >         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> >         at java.util.concurrent.FutureTask.run(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> >         at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> >         ... 8 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 432310 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> >         at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> >         at scala.util.Try$.apply(Try.scala:192)
> >         at scala.util.Failure.recover(Try.scala:216)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> >         at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.complete(Promise.scala:55)
> >         at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> >         at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> >         at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> >         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> >         at java.util.concurrent.FutureTask.run(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> >         at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> >         ... 8 more
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> >
> >
> >
> > Thank you!
> >
> > Arijit
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message