systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berthold Reinwald" <>
Subject Re: Java compiler for code generation
Date Sat, 01 Apr 2017 04:14:10 GMT
Sounds like a good idea.

Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will 
the dependency on Janino still be there (that question applies to JDK as 
well), and what is the footprint?

Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208

From:   Matthias Boehm <>
Date:   03/31/2017 08:17 PM
Subject:        Java compiler for code generation

Hi all,

currently, our new code generator for operator fusion, uses the
programmatic, which is Java's standard API for
compilation. Despite a plan cache that mitigates unnecessary compilation
and recompilation overheads, we still see significant end-to-end overhead
especially for small input data.

Moving forward, I'd like to switch to Janino
(org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java
compiler with restricted language support. The advantages are

(1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM,
and MLogreg, Janino improved total javac compilation time from 2.039 to
0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854 
0.283 (46 operators), respectively. At the same time, there was no
measurable impact on runtime efficiency, but even slightly reduced JIT
compilation overhead.

(2) Removed JDK requirement: Using the standard
requires the existence of a JDK, while Janino only requires a JRE, which
means it makes it easier to apply code generation by default.

However, I'm raising this here as Janino would add another explicit
dependency (with BSD license). Fortunately, Spark also uses Janino for
whole-stage-codegen. So we should be able to mark Janino as provided
library. The only issue is a pure Hadoop environment, where we still want
to use code generation for CP operations. To simplify the build, I could
imagine using the for hadoop execution types, but
Janino by default.

If you have any concerns, please let me know by Monday; otherwise I'd like
to push this change into our upcoming 0.14 release.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message