spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "van den Heever, Christian CC" <>
Subject RE: Dose pyspark supports python3.6?
Date Thu, 02 Nov 2017 03:58:22 GMT
Dear Spark users

I have been asked to provide a presentation / business case as to why to use spark and java
as ingestion tool for HDFS and HIVE
And why to move away from an etl tool.

Could you be so kind as to provide with some pros and cons to this.

I have the following :

In house build – code can be changes on the fly to suite business need.
Software is free
Can out of the box run on all nodes
Will support all Apache based software.
Fast deu to in memory processing
Spark UI can visualise execution
Support checkpoint data loads
Support echama regesty for custom schema and inference.
Support Yarn execution
Mlibs can be used in need.
Data linage support deu to spar usage.

Skills needed to maintain and build
In memory cabibility can become bottleneck if not managed
No ETL gui.

Maybe point be to an article if you have one.

Thanks a mill.

Standard Bank email disclaimer and confidentiality note
Please go to to read our email disclaimer
and confidentiality note. Kindly email (no content or subject
line necessary) if you cannot view that page and we will email our email disclaimer and confidentiality
note to you.
View raw message