spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: numpy + pyspark
Date Fri, 27 Jun 2014 15:08:39 GMT
Would deploying virtualenv on each directory on the cluster be viable? 
The dependencies would get tricky but I think this is the sort of 
situation it's built for.

On 6/27/14, 11:06 AM, Avishek Saha wrote:
> I too felt the same Nick but I don't have root privileges on the 
> cluster, unfortunately. Are there any alternatives?
>
>
> On 27 June 2014 08:04, Nick Pentreath <nick.pentreath@gmail.com 
> <mailto:nick.pentreath@gmail.com>> wrote:
>
>     I've not tried this - but numpy is a tricky and complex package
>     with many dependencies on Fortran/C libraries etc. I'd say by the
>     time you figure out correctly deploying numpy in this manner, you
>     may as well have just built it into your cluster bootstrap
>     process, or PSSH install it on each node...
>
>
>     On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha
>     <avishek.saha@gmail.com <mailto:avishek.saha@gmail.com>> wrote:
>
>         To clarify I tried it and it almost worked -- but I am getting
>         some problems from the Random module in numpy. If anyone has
>         successfully passed a numpy module (via the --py-files option)
>         to spark-submit then please let me know.
>
>         Thanks !!
>         Avishek
>
>
>         On 26 June 2014 17:45, Avishek Saha <avishek.saha@gmail.com
>         <mailto:avishek.saha@gmail.com>> wrote:
>
>             Hi all,
>
>             Instead of installing numpy in each worker node, is it
>             possible to
>             ship numpy (via --py-files option maybe) while invoking the
>             spark-submit?
>
>             Thanks,
>             Avishek
>
>
>
>


Mime
View raw message