spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject Re: Petastorm vs horovod vs tensorflowonspark vs spark_tensorflow_distributor
Date Mon, 07 Jun 2021 20:02:00 GMT
Hi Sean,

thank you so much for your kind response :)

Gourav Sengupta

On Sat, Jun 5, 2021 at 8:00 PM Sean Owen <> wrote:

> All of these tools are reasonable choices. I don't think the Spark project
> itself has a view on what works best. These things do different things. For
> example petastorm is not a training framework, but a way to feed data to a
> distributed DL training process on Spark. For what it's worth, Databricks
> ships Horovod and Petastorm, but that doesn't mean the other projects are
> second-class.
> On Tue, Jun 1, 2021 at 4:59 PM Gourav Sengupta <
>> wrote:
>> Dear TD, Matei, Michael, Reynold,
>> I hope all of you and your loved ones are staying safe and doing well.
>> as a member of the community the direction from the SPARK mentors is
>> getting to be a bit confusing for me and I was wondering if I can seek your
>> help.
>> We have to make long term decisions which is aligned with the open source
>> SPARK compatibility and directions and it will be wonderful to know what is
>> the most dependable route to get data from SPARK to tensorflow, is it:
>> 1. petastorm
>> 2. horovod
>> 3. tensorflowonspark
>> 4. spark_tensorflow_distributor
>> or something else.
>> Any comments from you will be super useful.
>> If I am not wrong, seamless integration between SPARK to tensorflow/
>> pytorch was one of the most exciting visions of SPARK 3.x
>> While using SPARK ML has its own favourite space, I think that tensorflow
>> and pytorch will see a lot of focused development as well.
>> Regards,
>> Gourav Sengupta

View raw message