okay this is all something which I would disagree with.

Dr. Matei Zaharia created SPARK
Then he and Bill Chambers wrote a book on SPARK recently
He is still the main thinking power behind SPARK (look at his research in Stanford)
The name of the book is "SPARK the definitive guide", its the best ever book and introduction on SPARK.

I have been through several documentation, at least 40 books on SPARK, and nothing even comes close to this book. And also it puts into rest much of arguments around which language to choose.

Thanks and Regards,
Gourav Sengupta

On Fri, Jul 5, 2019 at 11:55 AM Vikas Garg <sperry.it@gmail.com> wrote:

On Fri, 5 Jul 2019 at 15:38, Chris Teoh <chris.teoh@gmail.com> wrote:
Scala is better suited to data engineering work. It also has better integration with other components like HBase, Kafka, etc.

Python is great for data scientists as there are more data science libraries available in Python.

On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg, <sperry.it@gmail.com> wrote:
Is there any disadvantage of using Python? I have gone through multiple articles which says that Python has advantages over Scala.

Scala is super fast in comparison but Python has more pre-built libraries and options for analytics.

Still should I go with Scala?

On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer <kfehlhau@gmail.com> wrote:
Since you are a data engineer I would start by learning Scala. The parts of Scala you would need to learn are pretty basic. Start with the examples on the Spark website, which gives examples in multiple languages. Think of Scala as a typed version of Python. You will find that the error messages tend to be much more meaningful in Scala because that is the native language of Spark. If you don’t want to to install the JVM and Scala, I highly recommend Databricks community edition as a place to start. 

On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg <sperry.it@gmail.com> wrote:
I am currently working as a data engineer and I am working on Power BI, SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and also able to run queries through Spark on multi node cluster DB (I am using Vertica DB and later will move on HDFS or SQL Server).

I have good knowledge of Python also.

On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer <kfehlhau@gmail.com> wrote:
Are you a data scientist or data engineer?

On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg <sperry.it@gmail.com> wrote:

I am new Spark learner. Can someone guide me with the strategy towards getting expertise in PySpark.