systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <>
Subject Re: Implementation of Parallelized process in Standalone Spark Cluster using SystemML
Date Thu, 27 Jul 2017 05:11:53 GMT
Please put a "log=DEBUG" in the header of your parfor loop. This will show
you exactly what is going on. For example,

parfor(i in 1:N, log=DEBUG) {

will give you (1) the parfor plan before recompilation, (2) the parfor plan
after recompilation, (3) the applied rewrites and decisions, as well as (4)
the parfor plan after optimization. Such a parfor plan looks as follows,
where k=24 shows the parfor degree of parallelism and k=1 behind the
individual operators show that they're set to single-threaded execution.

--PARFOR (lines 4-6), exec=CP, k=24, dp=NONE, tp=FACTORING, rm=LOCAL_MEM
----GENERIC (lines 5-5), exec=CP, k=1
------u(print), exec=CP, k=1
------ua(+RC), exec=CP, k=1
------rix, exec=CP, k=1

You might be able to force it temporarily with parfor(i in 1:N,
opt=CONSTRAINED, par=16). There are scenarios where unknown sizes or too
conservative memory estimates leave the parfor optimizer no choice. If you
have indeed such a scenario (and I can imagine that because trees are
usually update-heavy), you might want to share the root cause once you
found it, so we can address this in the future.


On Wed, Jul 26, 2017 at 4:32 AM, Rajarshi Bhadra <>

> Hi,
> I have been using SystemML for sometime and I am finding it extremely
> useful for scaling up my algorithm using Spark. However there area few
> aspects which I am fully not understanding and would like to have some
> clarification
> My System Configuration: 244gb RAM, 32 Cores.
> My spark Configuration: 'spark.executor.cores', '4'
>                                        'spark.driver.memory', '80g'
>                                        'spark.executor.memory', '20g'
>                                        'spark.memory.fraction', '0.75'
>                                        'spark.worker.cleanup.enabled',
> 'true'
>                                        'spark.default.parallelism','1'
> I have a process in R which I am trying to implement. The process is
> similar to randomForest involving growing trees. Now The way the process is
> in R I parallelize it using the parLapply statement where n trees are grown
> in n parallel processes. I have implemented the algorithm in an identical
> way and tried running it using parfor loop. There are two main issues I am
> facing
> 1. In R using ncore = 16 i get 30 trees in 10 mins but in spark via
> systemml the process is taking 1 hour.
> 2. Also I have noticed that if one tree takes  2 mins to run 5 trees take
> 7-8 mins to run. It seems to me I am unable to parallelize the process by
> trees in SystemML
> It would be great if someone can help me out with this
> Thank you
> Rajarshi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message