spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yangliuyu <>
Subject Re: Native library can not be loaded when using Mllib PCA
Date Fri, 13 Jun 2014 04:40:00 GMT
Finally, we solved this problem by building our own netlib-java natives so
files on CentOS, it works without any warning but the performance is far
from running in Macbook Pro.

The matrix size is rows: 6778, columns: 2487

The MBP used 10 seconds to get the PCA result, but CentOS used 110s, event
MBP with pure BLAS java implementation will only use 40s

The source code cache the input matrix in memory, and only 200+kB data read
by shuffle.
The only different in http://localhost:4040/stages/ are 
Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Shuffle Read
Shuffle Write
14	aggregate at RowMatrix.scala:211 2014/06/13 12:18:12	*36 s*	3/3		

The Duration on mac is only 10s

So why RowMatrix.scala perform so differently on mac and CentOS, any related
to the native blas implementation?

Then I reduce the matrix size to half, and the duration is reduce to 2s on
mac and 12s on CentOS.

Is there any benchmark available for isolation the problem in either mllib
or netlib-java?

cpus are 3740QM on mac and E5620 on CentOS,70847

the log files are in attachment, 

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message