spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastián Ramírez <>
Subject PySpark: Python 2.7 cluster installation script (with Numpy, IPython, etc)
Date Wed, 11 Mar 2015 21:42:06 GMT
Many times, when I'm setting up a cluster, I have to use an operating
system (as RedHat or CentOS 6.5) which has an old version of Python (Python
2.6). For example, when using a Hadoop distribution that only supports
those operating systems (as Hortonworks' HDP or Cloudera).

 And that also makes installing additional advanced Python packages
difficult (such as Numpy, IPython, etc).

Then I tend to use Anaconda Python, an open source version of Python with
many of those packages pre-built and pre-installed.

But installing Anaconda in each node of the cluster might be tedious.

So I made a *simple script which helps installing Anaconda Python in the
machines of a cluster *more easily.

I wanted to share it here, in case it can help someone wanting using

*Sebastián Ramírez*
Head of Software Development

 Tel: (+571) 795 7950 ext: 1012
 Cel: (+57) 300 370 77 10
 Calle 73 No 7 - 06  Piso 4
 Twitter: @tiangolo <>

*This e-mail transmission, including any attachments, is intended only for 
the named recipient(s) and may contain information that is privileged, 
confidential and/or exempt from disclosure under applicable law. If you 
have received this transmission in error, or are not the named 
recipient(s), please notify Senseta immediately by return e-mail and 
permanently delete this transmission, including any attachments.*

View raw message