spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Rodriguez <df.rodriguez...@gmail.com>
Subject Re: ImportError: No module named numpy
Date Sat, 04 Jun 2016 18:34:59 GMT
Like people have said you need numpy in all the nodes of the cluster. The
easiest way in my opinion is to use anaconda:
https://www.continuum.io/downloads but that can get tricky to manage in
multiple nodes if you don't have some configuration management skills.

How are you deploying the spark cluster? If you are using cloudera I
recommend to use the Anaconda Parcel:
http://blog.cloudera.com/blog/2016/02/making-python-on-apache-hadoop-easier-with-anaconda-and-cdh/

On 4 Jun 2016, at 11:13, Gourav Sengupta <gourav.sengupta@gmail.com> wrote:

Hi,

I think that solution is too simple. Just download anaconda (if you pay for
the licensed version you will eventually feel like being in heaven when you
move to CI and CD and live in a world where you have a data product
actually running in real life).

Then start the pyspark program by including the following:

PYSPARK_PYTHON=<<path to your anaconda
installation>>/anaconda2/bin/python2.7 PATH=$PATH:<<path to your anaconda
installation>>/anaconda/bin <<path to your pyspark>>/pyspark

:)

In case you are using it in EMR the solution is a bit tricky. Just let me
know in case you want any further help.


Regards,
Gourav Sengupta





On Thu, Jun 2, 2016 at 7:59 PM, Eike von Seggern <eike.seggern@sevenval.com>
wrote:

> Hi,
>
> are you using Spark on one machine or many?
>
> If on many, are you sure numpy is correctly installed on all machines?
>
> To check that the environment is set-up correctly, you can try something
> like
>
> import os
> pythonpaths = sc.range(10).map(lambda i:
> os.environ.get("PYTHONPATH")).collect()
> print(pythonpaths)
>
> HTH
>
> Eike
>
> 2016-06-02 15:32 GMT+02:00 Bhupendra Mishra <bhupendra.mishra@gmail.com>:
>
>> did not resolved. :(
>>
>> On Thu, Jun 2, 2016 at 3:01 PM, Sergio Fernández <wikier@apache.org>
>> wrote:
>>
>>>
>>> On Thu, Jun 2, 2016 at 9:59 AM, Bhupendra Mishra <
>>> bhupendra.mishra@gmail.com> wrote:
>>>>
>>>> and i have already exported environment variable in spark-env.sh as
>>>> follows.. error still there  error: ImportError: No module named numpy
>>>>
>>>> export PYSPARK_PYTHON=/usr/bin/python
>>>>
>>>
>>> According the documentation at
>>> http://spark.apache.org/docs/latest/configuration.html#environment-variables
>>> the PYSPARK_PYTHON environment variable is for poniting to the Python
>>> interpreter binary.
>>>
>>> If you check the programming guide
>>> https://spark.apache.org/docs/0.9.0/python-programming-guide.html#installing-and-configuring-pyspark
>>> it says you need to add your custom path to PYTHONPATH (the script
>>> automatically adds the bin/pyspark there).
>>>
>>> So typically in Linux you would need to add the following (assuming you
>>> installed numpy there):
>>>
>>> export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages
>>>
>>> Hope that helps.
>>>
>>>
>>>
>>>
>>>> On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente <
>>>> julio@esbet.es> wrote:
>>>>
>>>>> Try adding to spark-env.sh (renaming if you still have it with
>>>>> .template at the end):
>>>>>
>>>>> PYSPARK_PYTHON=/path/to/your/bin/python
>>>>>
>>>>> Where your bin/python is your actual Python environment with Numpy
>>>>> installed.
>>>>>
>>>>>
>>>>> El 1 jun 2016, a las 20:16, Bhupendra Mishra <
>>>>> bhupendra.mishra@gmail.com> escribió:
>>>>>
>>>>> I have numpy installed but where I should setup PYTHONPATH?
>>>>>
>>>>>
>>>>> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández <wikier@apache.org>
>>>>> wrote:
>>>>>
>>>>>> sudo pip install numpy
>>>>>>
>>>>>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
>>>>>> bhupendra.mishra@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks .
>>>>>>> How can this be resolved?
>>>>>>>
>>>>>>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <holden@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Generally this means numpy isn't installed on the system
or your
>>>>>>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>>>>>>
>>>>>>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>>>>>>> bhupendra.mishra@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> If any one please can help me with following error.
>>>>>>>>>
>>>>>>>>>  File
>>>>>>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>>>>>>>> line 25, in <module>
>>>>>>>>>
>>>>>>>>> ImportError: No module named numpy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks in advance!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Cell : 425-233-8271
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sergio Fernández
>>>>>> Partner Technology Manager
>>>>>> Redlink GmbH
>>>>>> m: +43 6602747925
>>>>>> e: sergio.fernandez@redlink.co
>>>>>> w: http://redlink.co
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Sergio Fernández
>>> Partner Technology Manager
>>> Redlink GmbH
>>> m: +43 6602747925
>>> e: sergio.fernandez@redlink.co
>>> w: http://redlink.co
>>>
>>
>>
>
>
> --
> ------------------------------------------------
> *Jan Eike von Seggern*
> Data Scientist
> ------------------------------------------------
> *Sevenval Technologies GmbH *
>
> FRONT-END-EXPERTS SINCE 1999
>
> Köpenicker Straße 154 | 10997 Berlin
>
> office   +49 30 707 190 - 229
> mail     eike.seggern@sevenval.com
>
> www.sevenval.com
>
> Sitz: Köln, HRB 79823
> Geschäftsführung: Jan Webering (CEO), Thorsten May, Sascha Langfus,
> Joern-Carlos Kuntze
>
> *Wir erhöhen den Return On Investment bei Ihren Mobile und Web-Projekten.
> Sprechen Sie uns an:*http://roi.sevenval.com/
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
> FOLLOW US on
>
> [image: Sevenval blog]
> <http://sevenval.us11.list-manage1.com/track/click?u=5f2d34577b3182d6f029ebe63&id=ff955ef848&e=b789cc1a5f>
>
> [image: sevenval on twitter]
> <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=998e8f655c&e=b789cc1a5f>
>  [image: sevenval on linkedin]
> <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=7ae7d93d42&e=b789cc1a5f>[image:
> sevenval on pinterest]
> <http://sevenval.us11.list-manage2.com/track/click?u=5f2d34577b3182d6f029ebe63&id=f8c66fb950&e=b789cc1a5f>
>

Mime
View raw message