systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingyang Wang <>
Subject Questions about the Compositions of Execution Time
Date Thu, 20 Apr 2017 00:48:01 GMT
Hi all,

I have run some simple matrix multiplication in SystemML and found that JVM
GC time and Spark collect time are dominant.

For example, given 4 executors with 20 cores and 100GB memory each, and a
driver with 10GB memory, one setting is

R = read($R) # 1,000,000 x 80 -> 612M
S = read($S) # 20,000,000 x 20 -> 3G
FK = read($FK) # 20,000,000 x 1,000,000 (sparse) -> 358M
wS = Rand(rows=ncol(S), cols=1, min=0, max=1, pdf="uniform")
wR = Rand(rows=ncol(R), cols=1, min=0, max=1, pdf="uniform")

temp = S %*% wS + FK %*% (R %*% wR)
# some code to enforce the execution

It took 77.597s to execute while JVM GC took 70.282s.

Another setting is

T = read($T) # 20,000,000 x 100 -> 15G
w = Rand(rows=ncol(T), cols=1, min=0, max=1, pdf="uniform")

temp = T %*% w
# some code to enforce the execution

It took 92.582s to execute while Spark collect took 91.991s.

My questions are
1. Are these behaviors expected, as it seems only a tiny fraction of time
are spent on computation?
2. How can I tweak the configuration to tune the performance?
3. Is there any way to measure the time spent on data loading, computation,
disk accesses, and communication separately?
4. Any rule of thumb to estimate the memory needed for a program in

I really appreciate your inputs!

Mingyang Wang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message