Hi Arijit,
BTW, as you cannot share the code. You may find these issues helpful to
avoid this problem.
Even we have this problem, yet to be resolved.
https://issues.apache.org/jira/browse/SYSTEMML-831
Possible verdict. https://issues.apache.org/jira/browse/SPARK-6235
Have closer look at this comment
<https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147>.
May be you can try manipulating the drive configurations and/or batch
sizes.
Cheers, Janardhan
On Sun, Jul 16, 2017 at 8:51 PM, arijit chakraborty <akc14@hotmail.com>
wrote:
> Hi Janardhan,
>
>
> Thanks for your reply. I, for the time being, can't share the actual code.
> It's still work in progress. But our datasize is 28 MB and it has 100
> continuous variable and 1 column with numeric label variable.
>
>
> But thanks for guiding us that it's spark issue, rather than systemML
> issue.
>
>
> Thank you! Regards,
>
> Arijit
>
> ________________________________
> From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
> Sent: Sunday, July 16, 2017 10:15:12 AM
> To: dev@systemml.apache.org
> Subject: Re: Error while Executing code inSystemML
>
> Hi Arijit,
>
> Can you please send the exact code (the .dml file), you have used and the
> dataset details and sizes?. This is problem has something to do with the
> Apache Spark.
>
> Thanks, Janardhan
>
> On Sat, Jul 15, 2017 at 3:46 PM, arijit chakraborty <akc14@hotmail.com>
> wrote:
>
> > Hi,
> >
> >
> > I'm suddenly getting this error while running the code in systemML. For
> > smaller number of data points it running fine. But when I'm increasing
> the
> > data point, it's throwing this error.I'm using system with 244 gb ram 32
> > cores and 100 gb hard disk space and putting pyspark configurations in
> the
> > notebook only.
> >
> >
> > 17/07/14 21:49:25 WARN TaskSetManager: Stage 394647 contains a task of
> > very large size (558 KB). The maximum recommended
> > task size is 100 KB.
> > 17/07/14 21:54:18 ERROR ContextCleaner: Error cleaning broadcast 431882
> > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
> > seconds]. This timeout is controlled by spark.rpc
> > .askTimeout
> > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
> > at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:83)
> > at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(
> > BlockManagerMaster.scala:151)
> > at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(
> > TorrentBroadcast.scala:299)
> > at org.apache.spark.broadcast.TorrentBroadcastFactory.
> unbroadcast(
> > TorrentBroadcastFactory.scala:45)
> > at org.apache.spark.broadcast.BroadcastManager.unbroadcast(
> > BroadcastManager.scala:60)
> > at org.apache.spark.ContextCleaner.doCleanupBroadcast(
> > ContextCleaner.scala:232)
> > at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:188)
> > at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:179)
> > at scala.Option.foreach(Option.scala:257)
> > at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(Context
> > Cleaner.scala:179)
> > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.
> > scala:1245)
> > at org.apache.spark.ContextCleaner.org$apache$
> > spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172)
> > at org.apache.spark.ContextCleaner$$anon$1.run(
> > ContextCleaner.scala:67)
> > Caused by: java.util.concurrent.TimeoutException: Futures timed out
> after
> > [120 seconds]
> > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
> > scala:219)
> > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
> > scala:223)
> > at scala.concurrent.Await$$anonfun$result$1.apply(
> > package.scala:190)
> > at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> > BlockContext.scala:53)
> > at scala.concurrent.Await$.result(package.scala:190)
> > at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:81)
> > ... 12 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 431882 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> > at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> > at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> > at scala.util.Try$.apply(Try.scala:192)
> > at scala.util.Failure.recover(Try.scala:216)
> > at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> > at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> > at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> > at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> > at scala.concurrent.Promise$class.complete(Promise.scala:55)
> > at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> > at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> > at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> > at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> > at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> > at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> > at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> > at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> > at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> > at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> > at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> > at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> > at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> > at java.util.concurrent.FutureTask.run(Unknown Source)
> > at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> > at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> > at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> > ... 8 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 432310 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> > at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> > at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> > at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> > at scala.util.Try$.apply(Try.scala:192)
> > at scala.util.Failure.recover(Try.scala:216)
> > at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> > at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> > at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> > at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> > at scala.concurrent.Promise$class.complete(Promise.scala:55)
> > at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> > at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> > at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> > at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> > at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> > at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> > at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> > at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> > at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> > at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> > at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> > at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> > at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> > at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> > at java.util.concurrent.FutureTask.run(Unknown Source)
> > at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> > at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> > at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> > ... 8 more
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> >
> >
> >
> > Thank you!
> >
> > Arijit
> >
>
|