Hi Janardhan,
You can get instruction-level statistics with the commit
https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8
:
SystemML Statistics:
Total elapsed time: 18.956 sec.
Total compilation time: 1.924 sec.
Total execution time: 17.032 sec.
Number of compiled Spark inst: 3.
Number of executed Spark inst: 0.
Cache hits (Mem, WB, FS, HDFS): 29/0/0/1.
Cache writes (WB, FS, HDFS): 24/0/4.
Cache times (ACQr/m, RLS, EXP): 0.201/0.001/0.007/8.379 sec.
HOP DAGs recompiled (PRED, SB): 0/1.
HOP DAGs recompile time: 0.007 sec.
Spark ctx create time (lazy): 0.949 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time: 4.86 sec.
Total JVM GC count: 7.
Total JVM GC time: 0.192 sec.
Heavy hitter instructions:
# Instruction Time(s) Count Misc Timers
1 write [PCA.dml 110:8-110:14] 7.628 1
2 eigen [PCA.dml 85:1-85:1] 6.858 1 rlswr[0.000s,2], rlsev
[0.000s,0], aqmd[0.000s,2]
3 write [92:12-92:25] 0.689 1
4 ba+* [PCA.dml 110:8-110:14] 0.500 1 rlswr[0.000s,1], aqmd
[0.000s,1], aqrd[0.000s,2], rlsev[0.000s,0], rlsi[0.001s,2]
5 tsmm [PCA.dml 81:5-81:16] 0.338 1 rlswr[0.000s,1], rlsev
[0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1]
6 uacmean [PCA.dml 66:5-66:5] 0.320 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.200s,1]
7 uacsqk+ [PCA.dml 70:23-70:23] 0.177 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
8 ba+* [92:12-92:25] 0.175 1 rlswr[0.000s,1], aqrs
[0.000s,1], aqrd[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2]
9 / [PCA.dml 75:16-75:31] 0.088 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqrd[0.000s,2], aqmd[0.000s,1], rlsi[0.000s,2]
10 - [PCA.dml 67:9-67:13] 0.048 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2]
11 write [90:11-90:23] 0.044 1
12 uack+ [PCA.dml 80:6-80:6] 0.036 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
13 uacmean [PCA.dml 72:2-72:2] 0.028 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqrd[0.000s,1], aqmd[0.000s,1], rlsi[0.000s,1]
14 -* [PCA.dml 81:5-81:22] 0.026 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2]
15 / [PCA.dml 81:5-81:22] 0.019 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
16 write [102:1-102:1] 0.018 1
17 tsmm [PCA.dml 81:36-81:46] 0.008 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqrd[0.000s,1], rlsi[0.000s,1], aqmd[0.000s,1]
18 ctableexpand [88:1-88:1] 0.007 1 rlsev[0.000s,0], rlsi
[0.000s,2], aqms[0.000s,1], aqrd[0.000s,2], rlswr[0.002s,1]
19 seq [88:17-88:17] 0.004 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1]
20 ba+* [90:11-90:23] 0.003 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqrd[0.000s,1], rlsi[0.000s,2], aqmd[0.000s,1], aqrs[0.000s,1]
21 rsort [87:1-87:1] 0.003 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
22 sqrt [PCA.dml 75:20-75:20] 0.002 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
23 != 0.001 1
24 rmvar [-1:-1--1:-1] 0.001 22
25 ^2 [PCA.dml 73:25-73:30] 0.001 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1]
26 / [PCA.dml 73:14-73:37] 0.001 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1]
27 -* [PCA.dml 73:15-73:19] 0.000 1 rlswr[0.000s,1], rlsev
[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2], aqrd[0.000s,2]
28 sqrt [102:1-102:1] 0.000 1 rlswr[0.000s,1], rlsev
[0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1]
29 + [104:28-104:34] 0.000 1
30 createvar [90:11-90:23] 0.000 1
With initial glance (so please feel free to correct me if I am wrong),
Heavy hitter number 5 corresponds to the expression (t(A) %*% A).
Heavy hitter number 17 corresponds to the expression t(mu) %*% mu.
Heavy hitter number 17 corresponds to the expression (output of instruction
5) / scalar
and so on ...
As an FYI, here are the steps I followed
wget
https://raw.githubusercontent.com/apache/systemml/master/scripts/algorithms/PCA.dml
wget
https://raw.githubusercontent.com/apache/systemml/master/scripts/datagen/genRandData4PCA.dml
wget
https://raw.githubusercontent.com/apache/systemml/master/conf/SystemML-config.xml.template
mv SystemML-config.xml.template SystemML-config.xml
# Set systemml.stats.finegrained to true
# Make sure you do a git pull to get the commit
https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g
SystemML.jar -f genRandData4PCA.dml -nvargs R=10000 C=1000 F=binary
OUT=pcaData.mtx
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g
SystemML.jar -f PCA.dml -stats 30 -nvargs INPUT=pcaData.mtx
OUTPUT=pca-1000x1000-model PROJDATA=1 CENTER=1 SCALE=1
Thanks,
Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
To: dev@systemml.apache.org
Date: 07/21/2017 08:57 AM
Subject: about performance statistics of PCA.dml
Hi Mike,
I'd like to know how much expensive this critical code is
C = (t(A) %*% A)/(N-1) - (N/(N-1))*t(mu) %*% mu;
(at
https://github.com/apache/systemml/blob/master/scripts/algorithms/PCA.dml#L81
)
in the SPARK setting given
1. 60Kx700 input for A
2. For a datasize of 28 MB with 100 continuous variable and 1 column
with numeric label variable
with reference to this comment.(
https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147
)
Thank you,
Janardhan
|