spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Odersky <ja...@odersky.com>
Subject Re: installing packages with pyspark
Date Fri, 18 Mar 2016 02:42:55 GMT
> But I guess I cannot add a package once i launch the pyspark context right ?

Correct. Potentially, if you really really wanted to, you could maybe
(with lots of pain) load packages dynamically with some class-loader
black magic, but Spark does not provide that functionality.

On Thu, Mar 17, 2016 at 7:20 PM, Ajinkya Kale <kaleajinkya@gmail.com> wrote:
> Thanks Jakob, Felix. I am aware you can do it with --packages but i was
> wondering if there is a way to do something like "!pip install <package>"
> like i do for other packages from jupyter notebook for python. But I guess I
> cannot add a package once i launch the pyspark context right ?
>
> On Thu, Mar 17, 2016 at 6:59 PM Felix Cheung <felixcheung_m@hotmail.com>
> wrote:
>>
>> For some, like graphframes that are Spark packages, you could also use
>> --packages in the command line of spark-submit or pyspark. See
>> http://spark.apache.org/docs/latest/submitting-applications.html
>>
>> _____________________________
>> From: Jakob Odersky <jakob@odersky.com>
>> Sent: Thursday, March 17, 2016 6:40 PM
>> Subject: Re: installing packages with pyspark
>> To: Ajinkya Kale <kaleajinkya@gmail.com>
>> Cc: <user@spark.apache.org>
>>
>>
>> Hi,
>> regarding 1, packages are resolved locally. That means that when you
>> specify a package, spark-submit will resolve the dependencies and
>> download any jars on the local machine, before shipping* them to the
>> cluster. So, without a priori knowledge of dataproc clusters, it
>> should be no different to specify packages.
>>
>> Unfortunatly I can't help with 2.
>>
>> --Jakob
>>
>> *shipping in this case means making them available via the network
>>
>> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajinkya@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I had couple of questions.
>> > 1. Is there documentation on how to add the graphframes or any other
>> > package
>> > for that matter on the google dataproc managed spark clusters ?
>> >
>> > 2. Is there a way to add a package to an existing pyspark context
>> > through a
>> > jupyter notebook ?
>> >
>> > --aj
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message