systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@googlemail.com>
Subject Re: Spark Core
Date Wed, 12 Jul 2017 20:07:29 GMT
Well, we explicitly cleanup all intermediates that are no longer used. You
can use -explain to output the runtime plan, which includes rmvar (remove
variable), cpvar (copy variable) and mvvar (move variable) instructions
that internally cleanup intermediates. This cleanup removes data from
memory, potentially evicted/exported variables, and created broadcasts and
rdds. However, we also keep lineage to guard against eager broadcast/rdd
cleanup if they are still used by other lazily evaluated rdds, but whenever
an rdd is not referenced anymore, we cleanup its inputs.

Regarding the comparison to R, please ensure you are running in
hybrid_spark and not forced spark execution mode. Otherwise the latency of
distributed jobs might dominate the execution time for operations over
small data. Also, note that the spark write to csv currently requires a
sort (and hence shuffle) to create the correct order of rows in the output
files. If you want to read this later into SystemML again, you would be
better off writing to text or binary.

Regards,
Matthias

On Wed, Jul 12, 2017 at 11:44 AM, arijit chakraborty <akc14@hotmail.com>
wrote:

> Hi,
>
>
> Suppose I've this following code:
>
>
> a = matrix(seq(1,10), 10,1)
>
>
> for(i in 1:100){
>
>   b = a + 10
>
>   write (b, "path" + ".csv", format="csv")
>
> }
>
>
> So what I'm doing is for 100 items, I'm adding a constant to a matrix than
> outputting it. And this operation occurs in spark using multiple core of
> the system.
>
>
> My question is, after the operation is the value (here b) remains in that
> core (memory) of the system, so that it get piled up in the memory. Will
> this affect the performance of the process? If it is, how to clean the
> memory after each execution of loop?
>
>
> The reason for asking the question is, when I'm testing the code in R the
> performance is much better than systemML. Since R to systemML is almost
> one-to-one mapping, so I'm not sure where I'm making the mistake. And
> unfortunately at the stage of progress I can't share the exact code.
>
>
> Thanks you!
>
> Arijit
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message