spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Scala vs Python for ETL with Spark
Date Sat, 10 Oct 2020 10:31:40 GMT
It really depends on what your data scientists talk. I don’t think it makes sense for ad
hoc data science things to impose a language on them, but let them choose.
For more complex AI engineering things you can though apply different standards and criteria.
And then it really depends on architecture aspects etc.

> Am 09.10.2020 um 22:57 schrieb Mich Talebzadeh <mich.talebzadeh@gmail.com>:
> 
> 
> I have come across occasions when the teams use Python with Spark for ETL, for example
processing data from S3 buckets into Snowflake with Spark.
> 
> The only reason I think they are choosing Python as opposed to Scala is because they
are more familiar with Python. Since Spark is written in Scala, itself is an indication of
why I think Scala has an edge.
> 
> I have not done one to one comparison of Spark with Scala vs Spark with Python. I understand
for data science purposes most libraries like TensorFlow etc. are written in Python but I
am at loss to understand the validity of using Python with Spark for ETL purposes.
> 
> These are my understanding but they are not facts so I would like to get some informed
views on this if I can?
> 
> Many thanks,
> 
> Mich
> 
> 
> 
> 
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
>  

Mime
View raw message