spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: Code generation for GPU
Date Thu, 03 Sep 2015 20:37:17 GMT
See responses inline.

On Thu, Sep 3, 2015 at 1:58 AM, kiran lonikar <> wrote:

> Hi,
>    1. I found where the code generation
>    <>
>    in spark code from the blogs
>    and
>    However, I could not find where is the generated code executed? A major
>    part of my changes will be there since this executor will now have to send
>    vectors of columns to GPU RAM, invoke execution, and get the results back
>    to CPU RAM. Thus, the existing executor will significantly change.
> The code generation generates Java classes that have an apply method, and
the apply method is called in the operators.

E.g. GenerateUnsafeProjection returns a Projection class (which is just a
class with an apply method), and TungstenProject calls that class.

>    1. On the project tungsten blog
>    <>,
>    in the third Code Generation section, it is mentioned that you plan
>    to increase the level of code generation from record-at-a-time expression
>    evaluation to vectorized expression evaluation. Has this been implemented?
>    If not, how do I implement this? I will need access to columnar ByteBuffer
>    objects in DataFrame to do this. Having row by row access to data will
>    defeat this exercise. In particular, I need access to
>    in the executor of the generated code.
This is future work. You'd need to create batches of rows or columns. This
is a pretty major refactoring though.

>    1. One thing that confuses me is the changes from 1.4 to 1.5 possibly
>    due to JIRA and pull
>    request*. *This
>    changed the code generation from quasiquotes (q) to string s operator. This
>    makes it simpler for me to generate OpenCL code which is string based. The
>    question, is this branch stable now? Should I make my changes on spark 1.4
>    or spark 1.5 or master branch?
> In general Spark development velocity is pretty high, as we make a lot of
changes to internals every release. If I were you, I'd use either master or
branch-1.5 for your prototyping.

>    1. How do I tune the batch size (number of rows in the ByteBuffer)? Is
>    it through the property spark.sql.inMemoryColumnarStorage.batchSize?
> Thanks in anticipation,
> Kiran
> PS:
> Other things I found useful were:
> *Spark DataFrames*:
> *Apache Spark 1.5*:
> The links to JavaCL/ScalaCL:
> *Library to execute OpenCL code through Java*:
> *Library to convert Scala code to OpenCL and execute on GPUs*:

View raw message