systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arijit chakraborty <ak...@hotmail.com>
Subject Re: Error while Executing code inSystemML
Date Mon, 17 Jul 2017 18:10:33 GMT
Thanks a lot Janardhan! You guyz rocks!!


Without the help of you people I don't think I could make much headway in my project.


Thanks again!

Arijit

________________________________
From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
Sent: Monday, July 17, 2017 12:07:16 PM
To: dev@systemml.apache.org
Subject: Re: Error while Executing code inSystemML

Hi Arijit,

BTW, as you cannot share the code. You may find these issues helpful to
avoid this problem.

Even we have this problem, yet to be resolved.
https://issues.apache.org/jira/browse/SYSTEMML-831
Possible verdict. https://issues.apache.org/jira/browse/SPARK-6235

Have closer look at this comment
<https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147>.
May be you can try manipulating the drive configurations and/or batch
sizes.

Cheers, Janardhan

On Sun, Jul 16, 2017 at 8:51 PM, arijit chakraborty <akc14@hotmail.com>
wrote:

> Hi Janardhan,
>
>
> Thanks for your reply. I, for the time being, can't share the actual code.
> It's still work in progress. But our datasize is 28 MB and it has 100
> continuous variable and 1 column with numeric label variable.
>
>
> But thanks for guiding us that it's spark issue, rather than systemML
> issue.
>
>
> Thank you! Regards,
>
> Arijit
>
> ________________________________
> From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
> Sent: Sunday, July 16, 2017 10:15:12 AM
> To: dev@systemml.apache.org
> Subject: Re: Error while Executing code inSystemML
>
> Hi Arijit,
>
> Can you please send the exact code (the .dml file), you have used and the
> dataset details and sizes?. This is problem has something to do with the
> Apache Spark.
>
> Thanks, Janardhan
>
> On Sat, Jul 15, 2017 at 3:46 PM, arijit chakraborty <akc14@hotmail.com>
> wrote:
>
> > Hi,
> >
> >
> > I'm suddenly getting this error while running the code in systemML. For
> > smaller number of data points it running fine. But when I'm increasing
> the
> > data point, it's throwing this error.I'm using system with 244 gb ram 32
> > cores and 100 gb hard disk space and putting pyspark configurations in
> the
> > notebook only.
> >
> >
> > 17/07/14 21:49:25 WARN TaskSetManager: Stage 394647 contains a task of
> > very large size (558 KB). The maximum recommended
> > task size is 100 KB.
> > 17/07/14 21:54:18 ERROR ContextCleaner: Error cleaning broadcast 431882
> > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
> > seconds]. This timeout is controlled by spark.rpc
> > .askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
> >         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:83)
> >         at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(
> > BlockManagerMaster.scala:151)
> >         at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(
> > TorrentBroadcast.scala:299)
> >         at org.apache.spark.broadcast.TorrentBroadcastFactory.
> unbroadcast(
> > TorrentBroadcastFactory.scala:45)
> >         at org.apache.spark.broadcast.BroadcastManager.unbroadcast(
> > BroadcastManager.scala:60)
> >         at org.apache.spark.ContextCleaner.doCleanupBroadcast(
> > ContextCleaner.scala:232)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:188)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s
> > p$2.apply(ContextCleaner.scala:179)
> >         at scala.Option.foreach(Option.scala:257)
> >         at org.apache.spark.ContextCleaner$$anonfun$org$
> > apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(Context
> > Cleaner.scala:179)
> >         at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.
> > scala:1245)
> >         at org.apache.spark.ContextCleaner.org$apache$
> > spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172)
> >         at org.apache.spark.ContextCleaner$$anon$1.run(
> > ContextCleaner.scala:67)
> > Caused by: java.util.concurrent.TimeoutException: Futures timed out
> after
> > [120 seconds]
> >         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
> > scala:219)
> >         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
> > scala:223)
> >         at scala.concurrent.Await$$anonfun$result$1.apply(
> > package.scala:190)
> >         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> > BlockContext.scala:53)
> >         at scala.concurrent.Await$.result(package.scala:190)
> >         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> > RpcTimeout.scala:81)
> >         ... 12 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 431882 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> >         at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> >         at scala.util.Try$.apply(Try.scala:192)
> >         at scala.util.Failure.recover(Try.scala:216)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> >         at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.complete(Promise.scala:55)
> >         at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> >         at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> >         at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> >         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> >         at java.util.concurrent.FutureTask.run(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> >         at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> >         ... 8 more
> > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast
> > 432310 with removeFromMaster = true - Cannot recei
> > ve any reply in 120 seconds. This timeout is controlled by
> > spark.rpc.askTimeout
> > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in
> 120
> > seconds. This timeout is controlled by spark.r
> > pc.askTimeout
> >         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4
> > 8)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:63)
> >         at org.apache.spark.rpc.RpcTimeout$$anonfun$
> addMessageIfTimeout$1.
> > applyOrElse(RpcTimeout.scala:59)
> >         at scala.runtime.AbstractPartialFunction.apply(
> > AbstractPartialFunction.scala:36)
> >         at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> >         at scala.util.Try$.apply(Try.scala:192)
> >         at scala.util.Failure.recover(Try.scala:216)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.Future$$anonfun$recover$1.apply(
> > Future.scala:326)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at org.spark_project.guava.util.concurrent.MoreExecutors$
> > SameThreadExecutorService.execute(MoreExecutors.java:29
> > 3)
> >         at scala.concurrent.impl.ExecutionContextImpl$$anon$1.
> > execute(ExecutionContextImpl.scala:136)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.complete(Promise.scala:55)
> >         at scala.concurrent.impl.Promise$DefaultPromise.complete(
> > Promise.scala:153)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.Future$$anonfun$map$1.apply(Future.
> scala:237)
> >         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> >         at scala.concurrent.BatchingExecutor$Batch$$
> > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(
> > BatchingExecutor.scala:55)
> >         at scala.concurrent.BlockContext$.withBlockContext(
> > BlockContext.scala:72)
> >         at scala.concurrent.BatchingExecutor$Batch.run(
> > BatchingExecutor.scala:54)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > unbatchedExecute(Future.scala:601)
> >         at scala.concurrent.BatchingExecutor$class.
> > execute(BatchingExecutor.scala:106)
> >         at scala.concurrent.Future$InternalCallbackExecutor$.
> > execute(Future.scala:599)
> >         at scala.concurrent.impl.CallbackRunnable.
> > executeWithValue(Promise.scala:40)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(
> > Promise.scala:248)
> >         at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> >         at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(
> > Promise.scala:153)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$
> > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:
> > 205)
> >         at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(
> > NettyRpcEnv.scala:239)
> >         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> > Source)
> >         at java.util.concurrent.FutureTask.run(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(Unknown Source)
> >         at java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(Unknown Source)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> >         at java.lang.Thread.run(Unknown Source)
> > Caused by: java.util.concurrent.TimeoutException: Cannot receive any
> > reply in 120 seconds
> >         ... 8 more
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0
> >
> >
> >
> > Thank you!
> >
> > Arijit
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message