spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Masood Krohy <masood.kr...@intact.net>
Subject Re: Deep learning libraries for scala
Date Fri, 04 Nov 2016 15:29:31 GMT
If you need ConvNets and RNNs and want to stay in Scala/Java, then Deep 
Learning for Java (DL4J) might be the most mature option.

If you want  ConvNets and RNNs, as implemented in TensorFlow, along with 
all the bells and whistles, then you might want to switch to PySpark + 
TensorFlow and write the entire pipeline in Python. You'd do the data 
preparation/ingestion in PySpark and pass the data to TensorFlow for the 
ML part. There are 2 supported modes here:
1) Simultaneous multi-model training (a.k.a. embarrassingly parallel: each 
node has the entire data and model):
https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html

2) Data parallelism (data is distributed, each node has the entire model): 

There are some prototypes out there and TensorSpark seems to be most 
mature: https://github.com/adatao/tensorspark
It implements Downpour/Asynchronous SGD for the distributed training; it 
remains to be stress-tested with large datasets, however.
More info: 
https://arimo.com/machine-learning/deep-learning/2016/arimo-distributed-tensorflow-on-spark/

TensorFrames does not allow distributed training and I did not see any 
performance benchmarks last time I checked.

Alexander Ulanov of HP made a presentation of the options few months ago:
https://www.oreilly.com/learning/distributed-deep-learning-on-spark

Masood


------------------------------
Masood Krohy, Ph.D.
Data Scientist, Intact Lab-R&D
Intact Financial Corporation




De :    Benjamin Kim <bbuild11@gmail.com>
A :     janardhan shetty <janardhanp22@gmail.com>
Cc :    Gourav Sengupta <gourav.sengupta@gmail.com>, user 
<user@spark.apache.org>
Date :  2016-11-01 13:14
Objet : Re: Deep learning libraries for scala



To add, I see that Databricks has been busy integrating deep learning more 
into their product and put out a new article about this.

https://databricks.com/blog/2016/10/27/gpu-acceleration-in-databricks.html

An interesting tidbit is at the bottom of the article mentioning 
TensorFrames.

https://github.com/databricks/tensorframes

Seems like an interesting direction…

Cheers,
Ben


On Oct 19, 2016, at 9:05 AM, janardhan shetty <janardhanp22@gmail.com> 
wrote:

Agreed. But as it states deeper integration with (scala) is yet to be 
developed. 
Any thoughts on how to use tensorflow with scala ? Need to write wrappers 
I think. 

On Oct 19, 2016 7:56 AM, "Benjamin Kim" <bbuild11@gmail.com> wrote:
On that note, here is an article that Databricks made regarding using 
Tensorflow in conjunction with Spark.

https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html

Cheers,
Ben


On Oct 19, 2016, at 3:09 AM, Gourav Sengupta <gourav.sengupta@gmail.com> 
wrote:

while using Deep Learning you might want to stay as close to tensorflow as 
possible. There is very less translation loss, you get to access stable, 
scalable and tested libraries from the best brains in the industry and as 
far as Scala goes, it helps a lot to think about using the language as a 
tool to access algorithms in this instance unless you want to start 
developing algorithms from grounds up ( and in which case you might not 
require any libraries at all).

On Sat, Oct 1, 2016 at 3:30 AM, janardhan shetty <janardhanp22@gmail.com> 
wrote:
Hi,

Are there any good libraries which can be used for scala deep learning 
models ?
How can we integrate tensorflow with scala ML ?





Mime
View raw message