spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Petastorm vs horovod vs tensorflowonspark vs spark_tensorflow_distributor
Date Sat, 05 Jun 2021 19:00:04 GMT
All of these tools are reasonable choices. I don't think the Spark project
itself has a view on what works best. These things do different things. For
example petastorm is not a training framework, but a way to feed data to a
distributed DL training process on Spark. For what it's worth, Databricks
ships Horovod and Petastorm, but that doesn't mean the other projects are

On Tue, Jun 1, 2021 at 4:59 PM Gourav Sengupta <> wrote:

> Dear TD, Matei, Michael, Reynold,
> I hope all of you and your loved ones are staying safe and doing well.
> as a member of the community the direction from the SPARK mentors is
> getting to be a bit confusing for me and I was wondering if I can seek your
> help.
> We have to make long term decisions which is aligned with the open source
> SPARK compatibility and directions and it will be wonderful to know what is
> the most dependable route to get data from SPARK to tensorflow, is it:
> 1. petastorm
> 2. horovod
> 3. tensorflowonspark
> 4. spark_tensorflow_distributor
> or something else.
> Any comments from you will be super useful.
> If I am not wrong, seamless integration between SPARK to tensorflow/
> pytorch was one of the most exciting visions of SPARK 3.x
> While using SPARK ML has its own favourite space, I think that tensorflow
> and pytorch will see a lot of focused development as well.
> Regards,
> Gourav Sengupta

View raw message