systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <>
Subject Re: Proof of Concept: Embedded Scala DSL
Date Sat, 24 Sep 2016 17:39:08 GMT

Hi Felix,

Thanks for the summary. The document is extremely useful. I particularly
like the idea of parallelizing the code with 'breeze' library. I would like
to pitch in few ideas which would enable your code to be reused by other
1. Scala DSL/parallelize macro remains the same as described in your
documentation, but instead of generating DML directly, we call an
intermediate representation (IR). This IR then generates DML (instead of
generating DML directly by parallelize). This IR will be then reused by
Python DSL and R DSL.
2. As an example, IR could be a lazy Matrix class (which would be part of
SystemML). It could have awkward syntax/mechanism for pushing down control
structures for example: beginWhile and endWhile. Since IR will not be
exposed to the end-user, it should be fine.

 will call IR's add() method. At the end of parallelize or when the user
wants result (i.e. eval() ), IR could generate DML code and execute it.

Again, this is just a proposal and am fine dropping the idea of integrating
different DSL if it makes the implementation of Scala DSL complicated.
Also, please feel free to correct me if I am missing anything.


Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At

From:	Matthias Boehm/Almaden/IBM@IBMUS
Date:	09/24/2016 01:11 AM
Subject:	Re: Proof of Concept: Embedded Scala DSL

thanks for sharing the summary - this is very nice. While looking over the
example, I had the following questions:

1) Output handling: It would be great to see an example how the results of
Algorithm.execute() are consumed. Do you intend to hand out our binary
matrix representation or MLContext's Matrix from which the user then
requests specific output formats? Also if there are multiple Algorithm
instances, how is the MLContext (with its internal state of lazily
evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations that
are not supported in breeze? Examples are removeEmpty, table, aggregate,
rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN operations?

3) Frame data type and operations: Do you also intend to add a frame type
and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.


fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
(SYSTEMML-451) I have started to implement a prototype/proof of co

Date: 09/23/2016 04:36 PM
Subject: Proof of Concept: Embedded Scala DSL

As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you can
find on github together with the code:
Please note that current development happens in the Emma project but
will move to an independent module in the SystemML project once the
necessary additions to Emma are merged. By having the DSL in a separate
module, we can include Scala and Emma dependencies only for the users
that actually want to use the Scala DSL.

The current code serves as a proof of concept to discuss further
development with the SystemML community. I especially welcome input from
SystemML Scala users on the usability of the API design.
Next steps will include the translation from Scala code to DML with
support of all features currently supported in DML, including control
flow structures.
Also, a coherent way of executing the generated scripts from Scala and
the interaction with outside data formats (such as Spark Dataframes)
will be integrated.

I am happy to answer your questions and discuss the described approach


  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message