systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berthold Reinwald" <reinw...@us.ibm.com>
Subject Re: Discussion on GPU backend
Date Wed, 25 May 2016 13:03:31 GMT
the discussion is less about (1), (2), or (3). As practiced so far, (3) is 
the way to go.

The question is about (A) or (B). Curious was the Apache suggested 
practice is.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinwald@us.ibm.com



From:   Matthias Boehm/Almaden/IBM@IBMUS
To:     dev@systemml.incubator.apache.org
Date:   05/24/2016 09:10 PM
Subject:        Re: Discussion on GPU backend



Generally, I think we should really stick to (3) as done in the past, 
i.e., bring up major features in the roadmap discussions, create jira 
epics and try to break them into rather isolated tasks. This works for 
almost any major/minor feature. The only exception are features, where it 
is initially unknown if the potential benefits outweigh the increased 
complexity (or other disadvantages). Here, prototypes are required but 
everybody should be free to choose a way of maintaining them. I also don't 
expect too much collaboration here because of the unknown status. Once the 
initial unknowns are resolved, we should come back to (3) tough.

Regarding the GPU backend, the unknowns to resolve are (1) the handling of 
native libraries/kernels for deployment/test/dev, and (2) performance 
comparisons on selected algorithms (prototypes, not fully integrated), 
data sizes, and platforms. Once we have answers to these questions, we can 
create all the tasks for optimizer/runtime integration. 

Regards,
Matthias 


Niketan Pansare---05/24/2016 11:55:19 AM---Hi all, Since there is interest 
in collaborating on GPU backend, I wanted to know

From: Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 05/24/2016 11:55 AM
Subject: Re: Discussion on GPU backend



Hi all,

Since there is interest in collaborating on GPU backend, I wanted to know 
what is the preferred way to go ahead with a new feature (i.e. GPU 
backend) ? This discussion is also generally applicable to other major 
features (for example: Flink backend, Deep Learning support, OpenCL 
backend, new data types, new built-in functions, new algorithms, etc).

The first point of discussion is what would qualify as a "major feature" 
and how we integrate it into SystemML ? Here are three options that could 
serve as a potential requirement:
1. The feature has to be fully functional and fully optimized. For 
example: in the case of additional backends, the PR can only be merged in 
if and only if, all the instructions (CP or distributed) has been 
implemented and is at least as optimized as our existing alternate 
backends. In the case of algorithms or the built-in functions, the PR can 
only be merged in if and only if, it runs on all the backends for all 
datasets and is comparable in performance and accuracy with an external ML 
libraries.
2. The feature has to be fully functional. In this case, the PR can only 
be merged in if and only if all the instructions (CP or distributed) has 
been implemented. However, the first version of the new backend need not 
perform faster than our existing alternate backends.
3. Increment addition but with unit testcases that addresses quality and 
stability concerns. In this case, a PR can be merged if a subset of 
instructions has been implemented along with set of unit test cases 
suggested by our committers. The main benefit here is quick-feedback 
iterations from our committers, whereas the main drawback is an 
intermediate state where we don't fully support the given backend for 
certain scenario. 

If we decide to go with option 1 or 2, then potentially there will be a 
lot of code to review at the end and ideally we should give opportunity 
for our committers to provide early review comments on the feature. This 
will mitigate the risk of having to re-implement the entire feature. The 
options here are:
A. Create a branch on https://github.com/apache/incubator-systemml. This 
allows people to collaborate as well as allows committers to look at the 
code.
B. Create a branch on a fork and have a PR up to allow committers to raise 
concerns and provide suggestions. This is done for 
https://github.com/apache/incubator-systemml/pull/165 and 
https://github.com/apache/incubator-systemml/pull/119. To collaborate, the 
person creating PR will act as committer for the feature and will accept 
PR on its branch and will be responsible for resolving conflicts and 
keeping the PR in sync with the master.

If we decide to go with the option 3 (i.e. incremental addition), the 
option B seems to be logical choice as we already do this for other 
features.

My goal here is not to create a formal process but instead to avoid any 
potential misunderstanding/confusion and also to follow recommended Apache 
practices.

Please email back with your thoughts :)

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Deron Eriksson ---05/18/2016 11:22:26 AM---Hi Niketan, Good idea, I think 
that would be the cleanest solution for now. Since JCuda

From: Deron Eriksson <deroneriksson@gmail.com>
To: dev@systemml.incubator.apache.org
Date: 05/18/2016 11:22 AM
Subject: Re: Discussion on GPU backend



Hi Niketan,

Good idea, I think that would be the cleanest solution for now. Since 
JCuda
doesn't appear to be in a public maven repo, it adds a layer of difficulty
to clean integration via maven builds.

Deron


On Wed, May 18, 2016 at 10:55 AM, Niketan Pansare <npansar@us.ibm.com>
wrote:

> Hi Deron,
>
> Good points. I vote that we keep JCUDA and other accelerators we add as 
an
> external dependency. This means the user will have to ensure JCuda.jar 
in
> the class path and JCuda.DLL/JCuda.so in the LD_LIBRARY_PATH.
>
> I don't think JCuda.jar is platform-specific.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---05/18/2016 10:51:17
> AM---Hi, I'm wondering what would be a good way to handle JCuda]Deron
> Eriksson ---05/18/2016 10:51:17 AM---Hi, I'm wondering what would be a 
good
> way to handle JCuda in terms of the
>
> From: Deron Eriksson <deroneriksson@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 05/18/2016 10:51 AM
> Subject: Re: Discussion on GPU backend
> ------------------------------
>
>
>
> Hi,
>
> I'm wondering what would be a good way to handle JCuda in terms of the
> build release packages. Currently we have 11 artifacts that we are
> building:
>   systemml-0.10.0-incubating-SNAPSHOT-inmemory.jar
>   systemml-0.10.0-incubating-SNAPSHOT-javadoc.jar
>   systemml-0.10.0-incubating-SNAPSHOT-sources.jar
>   systemml-0.10.0-incubating-SNAPSHOT-src.tar.gz
>   systemml-0.10.0-incubating-SNAPSHOT-src.zip
>   systemml-0.10.0-incubating-SNAPSHOT-standalone.jar
>   systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz
>   systemml-0.10.0-incubating-SNAPSHOT-standalone.zip
>   systemml-0.10.0-incubating-SNAPSHOT.jar
>   systemml-0.10.0-incubating-SNAPSHOT.tar.gz
>   systemml-0.10.0-incubating-SNAPSHOT.zip
>
> It looks like JCuda is platform-specific, so you typically need 
different
> jars/dlls/sos/etc for each platform. If I'm understanding things 
correctly,
> if we generated Windows/Linux/LinuxPowerPC/MacOS-specific SystemML
> artifacts for JCuda, we'd potentially have an enormous number of 
artifacts.
>
> Is this something that could be potentially handled by specific profiles 
in
> the pom so that a user might be able to do something like "mvn clean
> package -P jcuda-windows" so that a user could be responsible for 
building
> the platform-specific SystemML jar for jcuda? Or is this something that
> could be handled differently, by putting the platform-specific jcuda jar 
on
> the classpath and any dlls or other needed libraries on the path?
>
> Deron
>
>
>
> On Tue, May 17, 2016 at 10:50 PM, Niketan Pansare <npansar@us.ibm.com>
> wrote:
>
> > Hi Luciano,
> >
> > Like all our backends, there is no change in the programming model. 
The
> > user submits a DML script and specifies whether she wants to use an
> > accelerator. Assuming that we compile jcuda jars into SystemML.jar, 
the
> > user can use GPU backend using following command:
> > spark-submit --master yarn-client ... -f MyAlgo.dml -accelerator -exec
> > hybrid_spark
> >
> > The user also needs to set LD_LIBRARY_PATH that points to JCuda DLL or 
so
> > files. Please see *https://issues.apache.org/jira/browse/SPARK-1720*
> > <https://issues.apache.org/jira/browse/SPARK-1720> ... For example: 
the
>
> > user can add following to spark-env.sh
> > export LD_LIBRARY_PATH=<path to jcuda so>:$LD_LIBRARY_PATH
> >
> > The first version of GPU backend will only accelerate CP. In this 
case,
> we
> > have four types of instructions:
> > 1. CP
> > 2. GPU (requires GPU on the driver)
> > 3. SPARK
> > 4. MR
> >
> > Note, the first version will require the CUDA/JCuda dependency to be
> > installed on the driver only.
> >
> > The next version will accelerate our distributed instructions as well. 
In
> > this case, we will have six types of instructions:
> > 1. CP
> > 2. GPU
> > 3. SPARK
> > 4. MR
> > 5. SPARK-GPU (requires GPU cluster)
> > 6. MR-GPU (requires GPU cluster)
> >
> > Thanks,
> >
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> >
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> >
> > [image: Inactive hide details for Luciano Resende ---05/17/2016 
09:13:24
> > PM---Great to see detailed information on this topic Niketan,]Luciano
> > Resende ---05/17/2016 09:13:24 PM---Great to see detailed information 
on
> > this topic Niketan, I guess I have missed when you posted it in
> >
> > From: Luciano Resende <luckbr1975@gmail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 05/17/2016 09:13 PM
> > Subject: Re: Discussion on GPU backend
> > ------------------------------
>
> >
> >
> >
> > Great to see detailed information on this topic Niketan, I guess I 
have
> > missed when you posted it initially.
> >
> > Could you elaborate a little more on what is the programming model for
> when
> > the user wants to leverage GPU ? Also, today I can submit a job to 
spark
> > using --jars and it will handle copying the dependencies to the worker
> > nodes. If my application wants to leverage GPU, what extras 
dependencies
> > will be required on the worker nodes, and how they are going to be
> > installed/updated on the Spark cluster ?
> >
> >
> >
> > On Tue, May 3, 2016 at 1:26 PM, Niketan Pansare <npansar@us.ibm.com>
> > wrote:
> >
> > >
> > >
> > > Hi all,
> > >
> > > I have updated the design document for our GPU backend in the JIRA
> > >
> https://issues.apache.org/jira/browse/SYSTEMML-445. The implementation
>
> > > details are based on the prototype I created and is available in PR
> > >
> https://github.com/apache/incubator-systemml/pull/131. Once we are done
>
> > > with the discussion, I can clean up and separate out the GPU backend
> in a
> > > separate PR for easier review :)
> > >
> > > Here are key design points:
> > > A GPU backend would implement two abstract classes:
> > >    1.   GPUContext
> > >    2.   GPUObject
> > >
> > >
> > >
> > > The GPUContext is responsible for GPU memory management and gets
> > call-backs
> > > from SystemML's bufferpool on following methods:
> > >    1.   void acquireRead(MatrixObject mo)
> > >    2.   void acquireModify(MatrixObject mo)
> > >    3.   void release(MatrixObject mo, boolean isGPUCopyModified)
> > >    4.   void exportData(MatrixObject mo)
> > >    5.   void evict(MatrixObject mo)
> > >
> > >
> > >
> > > A GPUObject (like RDDObject and BroadcastObject) is stored in
> > CacheableData
> > > object. It contains following methods that are called back from the
> > > corresponding GPUContext:
> > >    1.   void allocateMemoryOnDevice()
> > >    2.   void deallocateMemoryOnDevice()
> > >    3.   long getSizeOnDevice()
> > >    4.   void copyFromHostToDevice()
> > >    5.   void copyFromDeviceToHost()
> > >
> > >
> > >
> > > In the initial implementation, we will add JCudaContext and
> JCudaPointer
> > > that will extend the above abstract classes respectively. The
> > JCudaContext
> > > will be created by ExecutionContextFactory depending on the
> > user-specified
> > > accelarator. Analgous to MR/SPARK/CP, we will add a new ExecType: 
GPU
> and
> > > implement GPU instructions.
> > >
> > > The above design is general enough so that other people can 
implement
> > > custom accelerators (for example: OpenCL) and also follows the 
design
> > > principles of our CP bufferpool.
> > >
> > > Thanks,
> > >
> > > Niketan Pansare
> > > IBM Almaden Research Center
> > > E-mail: npansar At us.ibm.com
> > >
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> > >
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
> >
> >
>
>
>
>








Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message