Many times, when I'm setting up a cluster, I have to use an operating system (as RedHat or CentOS 6.5) which has an old version of Python (Python 2.6). For example, when using a Hadoop distribution that only supports those operating systems (as Hortonworks' HDP or Cloudera).

 And that also makes installing additional advanced Python packages difficult (such as Numpy, IPython, etc).

Then I tend to use Anaconda Python, an open source version of Python with many of those packages pre-built and pre-installed.

But installing Anaconda in each node of the cluster might be tedious.

So I made a simple script which helps installing Anaconda Python in the machines of a cluster more easily.

I wanted to share it here, in case it can help someone wanting using PySpark.

https://github.com/tiangolo/anaconda_cluster_install

Sebastián Ramírez
Head of Software Development

________________
 Tel: (+571) 795 7950 ext: 1012
 Cel: (+57) 300 370 77 10
 Calle 73 No 7 - 06  Piso 4
 
Linkedin: co.linkedin.com/in/tiangolo/
 Twitter: @tiangolo

----------------------------------------------------
This e-mail transmission, including any attachments, is intended only for the named recipient(s) and may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. If you have received this transmission in error, or are not the named recipient(s), please notify Senseta immediately by return e-mail and permanently delete this transmission, including any attachments.