Thanks a lot Niketan! This was a great help!

 

Sent from Mail for Windows 10

 

From: Niketan Pansare
Sent: Saturday, July 22, 2017 1:04 AM
To: dev@systemml.apache.org
Subject: RE: about performance statistics of PCA.dml

 

Yes, please remove the ".template" suffix, place SystemML-config.xml in the current directory and set the property systemml.stats.finegrained to true: https://github.com/apache/systemml/blob/master/conf/SystemML-config.xml.template#L73

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Inactive hide details for arijit chakraborty ---07/21/2017 12:21:33 PM---Hi Niketan, Sorry to get in between you and Janardhanarijit chakraborty ---07/21/2017 12:21:33 PM---Hi Niketan, Sorry to get in between you and Janardhanís chat. Actually Iím having issue in improving

From: arijit chakraborty <akc14@hotmail.com>
To: "dev@systemml.apache.org" <dev@systemml.apache.org>
Date: 07/21/2017 12:21 PM
Subject: RE: about performance statistics of PCA.dml




Hi Niketan,

Sorry to get in between you and Janardhanís chat. Actually Iím having issue in improving the performance of my system. You suggested me how to get some stats and Iíve incorporated it. But things after ďHeavy hitter instructions:Ē, seems very useful for me to debug where the system is taking too much time. So can you suggest me how to print out reports you are getting after ďHeavy hitter instructions:Ē. Do we need to make any changes in any of the config file?

Thank you!
Arijit

Sent from Mail for Windows 10

From: Niketan Pansare
Sent:
Friday, July 21, 2017 11:28 PM
To:
dev@systemml.apache.org
Subject:
Re: about performance statistics of PCA.dml

Hi Janardhan,

You can get instruction-level statistics with the commit
https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8:
SystemML Statistics:
Total elapsed time: 18.956 sec.
Total compilation time: 1.924 sec.
Total execution time: 17.032 sec.
Number of compiled Spark inst: 3.
Number of executed Spark inst: 0.
Cache hits (Mem, WB, FS, HDFS): 29/0/0/1.
Cache writes (WB, FS, HDFS): 24/0/4.
Cache times (ACQr/m, RLS, EXP): 0.201/0.001/0.007/8.379 sec.
HOP DAGs recompiled (PRED, SB): 0/1.
HOP DAGs recompile time: 0.007 sec.
Spark ctx create time (lazy): 0.949 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time: 4.86 sec.
Total JVM GC count: 7.
Total JVM GC time: 0.192 sec.
Heavy hitter instructions:
# Instruction Time(s) Count Misc Timers
1 write [PCA.dml 110:8-110:14] 7.628 1
2 eigen [PCA.dml 85:1-85:1] 6.858 1 rlswr[0.000s,2], rlsev[0.000s,0], aqmd[0.000s,2]
3 write [92:12-92:25] 0.689 1
4 ba+* [PCA.dml 110:8-110:14] 0.500 1 rlswr[0.000s,1], aqmd[0.000s,1], aqrd[0.000s,2], rlsev[0.000s,0], rlsi[0.001s,2]
5 tsmm [PCA.dml 81:5-81:16] 0.338 1 rlswr[0.000s,1], rlsev[0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1]

6 uacmean [PCA.dml 66:5-66:5] 0.320 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.200s,1]
7 uacsqk+ [PCA.dml 70:23-70:23] 0.177 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
8 ba+* [92:12-92:25] 0.175 1 rlswr[0.000s,1], aqrs[0.000s,1], aqrd[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2]
9 / [PCA.dml 75:16-75:31] 0.088 1 rlswr[0.000s,1], rlsev[0.000s,0], aqrd[0.000s,2], aqmd[0.000s,1], rlsi[0.000s,2]
10 - [PCA.dml 67:9-67:13] 0.048 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2]
11 write [90:11-90:23] 0.044 1
12 uack+ [PCA.dml 80:6-80:6] 0.036 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
13 uacmean [PCA.dml 72:2-72:2] 0.028 1 rlswr[0.000s,1], rlsev[0.000s,0], aqrd[0.000s,1], aqmd[0.000s,1], rlsi[0.000s,1]
14 -* [PCA.dml 81:5-81:22] 0.026 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2]
15 / [PCA.dml 81:5-81:22] 0.019 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]

16 write [102:1-102:1] 0.018 1
17 tsmm [PCA.dml 81:36-81:46] 0.008 1 rlswr[0.000s,1], rlsev[0.000s,0], aqrd[0.000s,1], rlsi[0.000s,1], aqmd[0.000s,1]

18 ctableexpand [88:1-88:1] 0.007 1 rlsev[0.000s,0], rlsi[0.000s,2], aqms[0.000s,1], aqrd[0.000s,2], rlswr[0.002s,1]
19 seq [88:17-88:17] 0.004 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1]
20 ba+* [90:11-90:23] 0.003 1 rlswr[0.000s,1], rlsev[0.000s,0], aqrd[0.000s,1], rlsi[0.000s,2], aqmd[0.000s,1], aqrs[0.000s,1]
21 rsort [87:1-87:1] 0.003 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
22 sqrt [PCA.dml 75:20-75:20] 0.002 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
23 != 0.001 1
24 rmvar [-1:-1--1:-1] 0.001 22
25 ^2 [PCA.dml 73:25-73:30] 0.001 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
26 / [PCA.dml 73:14-73:37] 0.001 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
27 -* [PCA.dml 73:15-73:19] 0.000 1 rlswr[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2], aqrd[0.000s,2]
28 sqrt [102:1-102:1] 0.000 1 rlswr[0.000s,1], rlsev[0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1]
29 + [104:28-104:34] 0.000 1
30 createvar [90:11-90:23] 0.000 1


With initial glance (so please feel free to correct me if I am wrong),
Heavy hitter number 5 corresponds to the expression
(t(A) %*% A).
Heavy hitter number 17 corresponds to the expression
t(mu) %*% mu.
Heavy hitter number 17 corresponds to the expression (output of instruction 5) / scalar
and so on ...


As an FYI, here are the steps I followed
wget
https://raw.githubusercontent.com/apache/systemml/master/scripts/algorithms/PCA.dml
wget
https://raw.githubusercontent.com/apache/systemml/master/scripts/datagen/genRandData4PCA.dml
wget
https://raw.githubusercontent.com/apache/systemml/master/conf/SystemML-config.xml.template
mv SystemML-config.xml.template SystemML-config.xml
# Set systemml.stats.finegrained to true
# Make sure you do a git pull to get the commit
https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g SystemML.jar -f genRandData4PCA.dml -nvargs R=10000 C=1000 F=binary OUT=pcaData.mtx
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g SystemML.jar -f PCA.dml -stats 30 -nvargs INPUT=pcaData.mtx OUTPUT=pca-1000x1000-model PROJDATA=1 CENTER=1 SCALE=1


Thanks,


Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com

http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Janardhan Pulivarthi ---07/21/2017 08:57:00 AM---Hi Mike, I'd like to know how much expensive this critical code is

From:
Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
To:
dev@systemml.apache.org
Date:
07/21/2017 08:57 AM
Subject:
about performance statistics of PCA.dml





Hi Mike,

I'd like to know how much expensive this critical code is

C = (t(A) %*% A)/(N-1) - (N/(N-1))*t(mu) %*% mu;

(at
https://github.com/apache/systemml/blob/master/scripts/algorithms/PCA.dml#L81)
in the SPARK setting given

1. 60Kx700 input for A
2. For a datasize of 28 MB with 100 continuous variable and 1 column
with numeric label variable

with reference to this comment.(
https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147
)

Thank you,
Janardhan