spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Teoh <>
Subject Re: Learning Spark
Date Fri, 05 Jul 2019 10:08:27 GMT
Scala is better suited to data engineering work. It also has better
integration with other components like HBase, Kafka, etc.

Python is great for data scientists as there are more data science
libraries available in Python.

On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg, <> wrote:

> Is there any disadvantage of using Python? I have gone through multiple
> articles which says that Python has advantages over Scala.
> Scala is super fast in comparison but Python has more pre-built libraries
> and options for analytics.
> Still should I go with Scala?
> On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer <> wrote:
>> Since you are a data engineer I would start by learning Scala. The parts
>> of Scala you would need to learn are pretty basic. Start with the examples
>> on the Spark website, which gives examples in multiple languages. Think of
>> Scala as a typed version of Python. You will find that the error messages
>> tend to be much more meaningful in Scala because that is the native
>> language of Spark. If you don’t want to to install the JVM and Scala, I
>> highly recommend Databricks community edition as a place to start.
>> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg <> wrote:
>>> I am currently working as a data engineer and I am working on Power BI,
>>> SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
>>> also able to run queries through Spark on multi node cluster DB (I am using
>>> Vertica DB and later will move on HDFS or SQL Server).
>>> I have good knowledge of Python also.
>>> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer <> wrote:
>>>> Are you a data scientist or data engineer?
>>>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg <> wrote:
>>>>> Hi,
>>>>> I am new Spark learner. Can someone guide me with the strategy towards
>>>>> getting expertise in PySpark.
>>>>> Thanks!!!

View raw message