Thanks Matthias for your answers! I'll look into proper allocation of the memory issue. Maybe
you rightly pointed out, I'm making mistake there. I'm more of a data scientist than a programmer.
So I generally don't tinker with existing setup and so have less knowledge about these things.
I'll read up about spark configuration and try to fix this part.
Regarding "Recompile time", I didn't make any changes there. I created spark folder and importing
systemML using "pip" statement. I also put systemML jar file in the spark jar folder. So I'm
not sure why recompile time is taking so long. These are the only tinkering I did with systemML
setup. I'm also calling all spark configuration through jupyter interface as I don't want
to accidentally make any mistake and also it's much easier. If you could guide me where I
might make mistake then it would be of great help.
Thank you again for all your help!
Regards,
Arijit
________________________________
From: Matthias Boehm <mboehm7@googlemail.com>
Sent: Monday, July 17, 2017 2:09:47 AM
To: dev@systemml.apache.org
Subject: Re: Decaying performance of SystemML
thanks for sharing the skeleton of the script. Here are a couple of
suggestions:
1) Impact of fine-grained stats: The provided script executes mostly scalar
instructions. In those kinds of scenarios, the time measurement per
instruction can be a major performance bottleneck. I just executed this
script with and without -stats and got end-to-end execution times of 132s
and 54s respectively, which confirms this.
2) Memory budget: You allocate a vector of 100M, i.e., 800MB - the fact
that the stats output shows spark instructions means that you're running
the driver with very small memory (maybe the default of 1GB?). When
comparing with R please ensure that both have the same memory budget. On
large data, we would compile distributed operations but of course you only
benefit from that if you have a cluster - right now you're running in Spark
local mode only.
3) Recompile time: Another thing that looks suspicious to me is the
recompilation time of 15.529s for 4 recompilations. Typically, we see <1ms
recompilation times per average DAG of 50-100 operators - could it be that
there are some setup issues which lazily load classes and libraries?
Regards,
Matthias
On Sun, Jul 16, 2017 at 8:31 AM, arijit chakraborty <akc14@hotmail.com>
wrote:
> Hi Matthias,
>
>
> I was trying the following code in both R and systemML. The difference in
> speed is huge, in computational term.
>
> R time: 1.837146 mins
> SystemML Time: Wall time: 4min 33s
>
> The code I'm working on is very similar to this code. The only difference
> is I'm doing lot more computation within these 2 while-loops.
>
> Can you help me understand why I'm getting this difference. My
> understanding was with larger datasize, the systemML performance should be
> far better than R performance. In smaller datasize their performances are
> almost the same.
>
> The code has been tested in the same system. The spark configuration is
> the following.
>
>
> import os
> import sys
> import pandas as pd
> import numpy as np
>
> spark_path = "C:\spark"
> os.environ['SPARK_HOME'] = spark_path
> os.environ['HADOOP_HOME'] = spark_path
>
> sys.path.append(spark_path + "/bin")
> sys.path.append(spark_path + "/python")
> sys.path.append(spark_path + "/python/pyspark/")
> sys.path.append(spark_path + "/python/lib")
> sys.path.append(spark_path + "/python/lib/pyspark.zip")
> sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")
>
> from pyspark import SparkContext
> from pyspark import SparkConf
>
> sc = SparkContext("local[*]", "test")
>
>
> # SystemML Specifications:
>
>
> from pyspark.sql import SQLContext
> import systemml as sml
> sqlCtx = SQLContext(sc)
> ml = sml.MLContext(sc)
>
>
>
> The code we tested:
>
>
> a = matrix(seq(1, 100000000, 1), 1 , 100000000)
>
> b = 2
>
> break_cond_1 = 0
> while(break_cond_1 == 0 ){
> break_cond_2 = 0
> while(break_cond_2 == 0 ){
>
> ## Checking if atleast 10 numbers are there in the data-points which
> is even
> c = 0
> for(i in 1:ncol(a)){
>
> if( i %% 2 == 0){
> c = c + 1
> }
>
> }
> #c = c + 2
> if( c > 1000){
>
> break_cond_2 = 1
> }else{
>
> c = c + 2
>
>
> }
>
> }
>
> if(break_cond_2 == 1){
> break_cond_1 = 1
> }else{
>
> c = c + 2
> }
>
>
>
> }
>
> Please find some more systemML information below:
>
> SystemML Statistics:
> Total elapsed time: 0.000 sec.
> Total compilation time: 0.000 sec.
> Total execution time: 0.000 sec.
> Number of compiled Spark inst: 5.
> Number of executed Spark inst: 5.
> Cache hits (Mem, WB, FS, HDFS): 3/0/0/0.
> Cache writes (WB, FS, HDFS): 6/0/0.
> Cache times (ACQr/m, RLS, EXP): 0.000/0.001/0.004/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/4.
> HOP DAGs recompile time: 15.529 sec.
> Spark ctx create time (lazy): 0.091 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
> Total JIT compile time: 0.232 sec.
> Total JVM GC count: 5467.
> Total JVM GC time: 8.237 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) %% 33.235 sec 100300000
> -- 2) rmvar 27.762 sec 250750035
> -- 3) == 26.179 sec 100300017
> -- 4) + 15.555 sec 50150000
> -- 5) assignvar 6.611 sec 50150018
> -- 6) sp_seq 0.675 sec 1
> -- 7) sp_rshape 0.070 sec 1
> -- 8) sp_chkpoint 0.017 sec 3
> -- 9) seq 0.014 sec 3
> -- 10) rshape 0.003 sec 3
>
>
>
>
>
>
> Thank you!
>
> Arijit
>
>
> ________________________________
> From: arijit chakraborty <akc14@hotmail.com>
> Sent: Wednesday, July 12, 2017 12:21:43 AM
> To: dev@systemml.apache.org
> Subject: Re: Decaying performance of SystemML
>
> Thank you Matthias! I'll follow your suggestions. Regarding TB, I had this
> confusion that "g" implies 512 mb. That's why I kept around 2TB memory.
>
>
> Thanks again!
>
> Arijit
>
> ________________________________
> From: Matthias Boehm <mboehm7@googlemail.com>
> Sent: Tuesday, July 11, 2017 10:42:58 PM
> To: dev@systemml.apache.org
> Subject: Re: Decaying performance of SystemML
>
> without any specifics of scripts or datasets, it's unfortunately, hard
> if not impossible to help you here. However, note that the memory
> configuration seems wrong. Why would you configure the driver and
> executors with 2TB if you only have 256GB per node. Maybe you observe an
> issue of swapping. Also note that the maxResultSize is irrelevant in
> case SystemML creates the spark context because we would anyway set it
> to unlimited.
>
> Regarding generally recommend configurations, it's usually a good idea
> to use one executor per worker node with the number of cores set to the
> number of virtual cores. This allows maximum sharing of broadcasts
> across tasks and hence reduces memory pressure.
>
> Regards,
> Matthias
>
> On 7/11/2017 9:36 AM, arijit chakraborty wrote:
> > Hi,
> >
> >
> > I'm creating a process using systemML. But after certain period of time,
> the performance decreases.
> >
> >
> > 1) This warning message: WARN TaskSetManager: Stage 25254 contains a
> task of very large size (3954 KB). The maximum recommended task size is 100
> KB.
> >
> >
> > 2) For Spark, we are implementing this setting:
> >
> > spark.executor.memory 2048g
> >
> > spark.driver.memory 2048g
> >
> > spark.driver.maxResultSize 2048
> >
> > is this good enough, or we can do something else to improve the
> performance? WE tried the spark implementation suggested in the
> documentation. But it didn't help much.
> >
> >
> > 3) We are running on a system with 244 gb ram 32 cores and 100 gb hard
> disk space.
> >
> >
> > it will be great if anyone can guide me how to improve the performance.
> >
> >
> > Thank you!
> >
> > Arijit
> >
>
|