spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex A. Reda" <alex.re...@gmail.com>
Subject Re: Learning Spark
Date Fri, 05 Jul 2019 13:49:20 GMT
Hello,

I also second Gourav's point regarding "Spark the definitive guide" book.
This is great for learning both Scala and python based SPARK. But as others
mentioned, you will need to continuously read the documentation as SPARK is
still undergoing a lot of improvements. I list additional resources below,
no plug :)

-       Excellent training on Spark 2 in Udemy by Jose Portilla. This one
is on Pyspark, he also has a training on Scala. Not super advanced but
touches the basics very well.
https://www.udemy.com/apache-spark-with-python-big-data-with-pyspark-and-spark/



-        Great book on Spark 2, "Learning Pyspark" by Chambers and Zaharia
- so far the best in the resource lineup both for scala based and python
based Spark -
https://www.packtpub.com/big-data-and-business-intelligence/learning-pyspark
(Read
Chapter 1, 2, 4, and 6 to get immediate benefits)



-        Great book on Spark by Tomasz Drabas and Denny Lee.
https://www.amazon.com/Spark-Definitive-Guide-Processing-Simple/dp/1491912219/ref=sr_1_1?ie=UTF8&qid=1540567390&sr=8-1&keywords=spark+the+definitive+guide
(Part
I, II, VI are the most important to get started). Apparently, they have a
new edition, I am referring to the 2017 edition.


- A bit dated now because Spark has evolved so much but I like Jeffrey
Aven's book and style of writing too."Sams Teach Yourself Apache Spark in
24 hours
<https://www.amazon.com/Apache-Spark-Hours-Teach-Yourself/dp/0672338513/ref=sr_1_1?crid=75O5XD7JSREF&keywords=apache+spark+in+24+hours%2C+sams+teach+yourself&qid=1562333740&s=gateway&sprefix=sams+teach++apache+spark%2Caps%2C156&sr=8-1>
"

In terms of actually learning, I would suggest practicing the code plus
based on my experience you are better off installing spark to your local
PC. I found this a much better way of learning than using an enterprise
cluster. Depending on which rout you take, if you decide to focus on
Pyspark, learning Scikit learn will provide you a lot of transferable
skills.

One final note, I am providing the suggestion from the perspective of a
data scientist.

Kind regards,

Alex Reda







On Fri, Jul 5, 2019 at 9:24 AM Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> okay this is all something which I would disagree with.
>
> Dr. Matei Zaharia created SPARK
> Then he and Bill Chambers wrote a book on SPARK recently
> He is still the main thinking power behind SPARK (look at his research in
> Stanford)
> The name of the book is "SPARK the definitive guide", its the best ever
> book and introduction on SPARK.
>
> I have been through several documentation, at least 40 books on SPARK, and
> nothing even comes close to this book. And also it puts into rest much of
> arguments around which language to choose.
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Fri, Jul 5, 2019 at 11:55 AM Vikas Garg <sperry.it@gmail.com> wrote:
>
>> Thanks!!!
>>
>> On Fri, 5 Jul 2019 at 15:38, Chris Teoh <chris.teoh@gmail.com> wrote:
>>
>>> Scala is better suited to data engineering work. It also has better
>>> integration with other components like HBase, Kafka, etc.
>>>
>>> Python is great for data scientists as there are more data science
>>> libraries available in Python.
>>>
>>> On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg, <sperry.it@gmail.com> wrote:
>>>
>>>> Is there any disadvantage of using Python? I have gone through multiple
>>>> articles which says that Python has advantages over Scala.
>>>>
>>>> Scala is super fast in comparison but Python has more pre-built
>>>> libraries and options for analytics.
>>>>
>>>> Still should I go with Scala?
>>>>
>>>> On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer <kfehlhau@gmail.com> wrote:
>>>>
>>>>> Since you are a data engineer I would start by learning Scala. The
>>>>> parts of Scala you would need to learn are pretty basic. Start with the
>>>>> examples on the Spark website, which gives examples in multiple languages.
>>>>> Think of Scala as a typed version of Python. You will find that the error
>>>>> messages tend to be much more meaningful in Scala because that is the
>>>>> native language of Spark. If you don’t want to to install the JVM and
>>>>> Scala, I highly recommend Databricks community edition as a place to
start.
>>>>>
>>>>> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg <sperry.it@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am currently working as a data engineer and I am working on Power
>>>>>> BI, SSIS (ETL Tool). For learning purpose, I have done the setup
PySpark
>>>>>> and also able to run queries through Spark on multi node cluster
DB (I am
>>>>>> using Vertica DB and later will move on HDFS or SQL Server).
>>>>>>
>>>>>> I have good knowledge of Python also.
>>>>>>
>>>>>> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer <kfehlhau@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Are you a data scientist or data engineer?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg <sperry.it@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am new Spark learner. Can someone guide me with the strategy
>>>>>>>> towards getting expertise in PySpark.
>>>>>>>>
>>>>>>>> Thanks!!!
>>>>>>>>
>>>>>>>

Mime
View raw message