spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Deep Learning with Spark, what is your experience?
Date Sun, 05 May 2019 11:19:23 GMT
There are deep learning libraries on R, Matlab and even in RDBMS question.
My apologies as I thought that you are interested in deep learning, looks
like you are more interested in making SPARK work for deep learning because
of your comfort.

Regards,
Gourav

On Sat, May 4, 2019 at 11:29 PM Riccardo Ferrari <ferrarir@gmail.com> wrote:

> Thank you for your answers!
>
> While it is clear each DL framework can solve the distributed model
> training on their own (some better than others).  Still I see a lot of
> value of having Spark on the ETL/pre-processing part, thus the origin of my
> question.
> I am trying to avoid to mange multiple stacks/workflows and hoping to
> unify my system. Projects like TensorflowOnSpark or Analytics-Zoo (to name
> couple) feels like they can help, still I really appreciate your comments
> and anyone that could add some value to this discussion. Does anyone have
> experience with them?
>
> Thanks
>
> On Sat, May 4, 2019 at 8:01 PM Pat Ferrel <pat@occamsmachete.com> wrote:
>
>> @Riccardo
>>
>> Spark does not do the DL learning part of the pipeline (afaik) so it is
>> limited to data ingestion and transforms (ETL). It therefore is optional
>> and other ETL options might be better for you.
>>
>> Most of the technologies @Gourav mentions have their own scaling based on
>> their own compute engines specialized for their DL implementations, so be
>> aware that Spark scaling has nothing to do with scaling most of the DL
>> engines, they have their own solutions.
>>
>> From: Gourav Sengupta <gourav.sengupta@gmail.com>
>> <gourav.sengupta@gmail.com>
>> Reply: Gourav Sengupta <gourav.sengupta@gmail.com>
>> <gourav.sengupta@gmail.com>
>> Date: May 4, 2019 at 10:24:29 AM
>> To: Riccardo Ferrari <ferrarir@gmail.com> <ferrarir@gmail.com>
>> Cc: User <user@spark.apache.org> <user@spark.apache.org>
>> Subject:  Re: Deep Learning with Spark, what is your experience?
>>
>> Try using MxNet and Horovod directly as well (I think that MXNet is worth
>> a try as well):
>> 1.
>> https://medium.com/apache-mxnet/distributed-training-using-apache-mxnet-with-horovod-44f98bf0e7b7
>> 2.
>> https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_19-01.html
>> 3. https://aws.amazon.com/mxnet/
>> 4.
>> https://aws.amazon.com/blogs/machine-learning/aws-deep-learning-amis-now-include-horovod-for-faster-multi-gpu-tensorflow-training-on-amazon-ec2-p3-instances/
>>
>>
>> Ofcourse Tensorflow is backed by Google's advertisement team as well
>> https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/
>>
>>
>> Regards,
>>
>>
>>
>>
>> On Sat, May 4, 2019 at 10:59 AM Riccardo Ferrari <ferrarir@gmail.com>
>> wrote:
>>
>>> Hi list,
>>>
>>> I am trying to undestand if ti make sense to leverage on Spark as
>>> enabling platform for Deep Learning.
>>>
>>> My open question to you are:
>>>
>>>    - Do you use Apache Spark in you DL pipelines?
>>>    - How do you use Spark for DL? Is it just a stand-alone stage in the
>>>    workflow (ie data preparation script) or is it  more integrated
>>>
>>> I see a major advantage in leveraging on Spark as a unified entrypoint,
>>> for example you can easily abstract data sources and leverage on existing
>>> team skills for data pre-processing and training. On the flip side you may
>>> hit some limitations including supported versions and so on.
>>> What is your experience?
>>>
>>> Thanks!
>>>
>>

Mime
View raw message