systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <>
Subject Re: Documents of SystemML Algorithms Reference
Date Wed, 19 Apr 2017 21:21:25 GMT

Hi Ethan,

Good points, the documentation is incomplete. The Arguments section only
describes the arguments for command-line invocation and not via Python and
Scala. This should be clearly marked to avoid confusion.

The Python wrappers are implemented to be compatible with MLLib and Scikit

For training, you can pass features and labels as
1. Scikit-learn way: two Python objects (X_train, y_train) of type numpy,
pandas or scipy., y_train)


2. MLLib way: one LabeledPoint DataFrame with atleast two columns: features
(of type Vector) and labels.

For prediction, you can pass features as
1. Scikit-learn way: one Python  object (X_test) of type numpy, pandas or


2. MLLib way: one LabeledPoint DataFrame (df_test) with atleast one column:
features (of type Vector).

The usage is briefly described in


Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At

From:	Ethan Xu <>
Date:	04/19/2017 02:07 PM
Subject:	Documents of SystemML Algorithms Reference


I'm reading the documents on Multinomial Logistic Regression (
with Scala API. It says

val model =
val prediction = model.transform(X_test_df)

The "Arguments" section below it says:

X: Location (on HDFS) to read the input matrix of feature vectors; each row
constitutes one feature vector.

Y: Location to read the input one-column matrix of category labels that
correspond to feature vectors in X. Note the following:...
The explanation of the arguments seem to correspond to the Hadoop and Spark

Could someone please advise what are the specifications of `X_train_df` and
`X_test_df`? Are they the same as specified in the Python API? i.e.:

# X_train, y_train and X_test can be NumPy matrices or Pandas
DataFrame or SciPy Sparse Matrixy_test =,
y_train).predict(X_test)# df_train is DataFrame that contains two
columns: "features" (of type Vector) and "label". df_test is a
DataFrame that contains the column "features"

The explanation of arguments for Python/Scala seem to be missing for other
algorithms, too.

Thanks a lot,


  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message