spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rajat kumar <kumar.rajat20...@gmail.com>
Subject Re: Running pyspark job from virtual environment
Date Sun, 17 Jan 2021 17:22:23 GMT
Hi Mich,

Thanks for response. I am running it through CLI (on the cluster).

Since this will be scheduled job. I do not want to activate the environment
manually. It should automatically take the path of virtual environment to
run the job.

For that I saw 3 properties which I mentioned. I think setting  some of
them to point to environment binary will help to run the job from venv.

PYTHONPATH
PYSPARK_DRIVER_PYTHON
PYSPARK_PYTHON

Also, It has to be set in env.sh or bashrc file? What is the difference
between spark-env.sh and bashrc

Thanks
Rajat



On Sun, Jan 17, 2021 at 10:32 PM Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi Rajat,
>
> Are you running this through an IDE like PyCharm or on CLI?
>
> If you already have a Python Virtual environment, then just activate it
>
> The only env variable you need to set is export PYTHONPATH that you can do
> it in your startup shell script .bashrc etc.
>
> Once you are in virtual environment, then you run:
>
> $SPARK_HOME/bin/spark-submit <Python.py)
>
> Alternatively you can chmod +x <python file), and add the following line
> to the file
>
> #! /usr/bin/env python3
>
> and then you can run it as.
>
> ./<python.py>
>
> HTH
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 17 Jan 2021 at 13:41, rajat kumar <kumar.rajat20del@gmail.com>
> wrote:
>
>> Hello,
>>
>> Can anyone confirm here please?
>>
>> Regards
>> Rajat
>>
>> On Sat, Jan 16, 2021 at 11:46 PM rajat kumar <kumar.rajat20del@gmail.com>
>> wrote:
>>
>>> Hey Users,
>>>
>>> I want to run spark job from virtual environment using Python.
>>>
>>> Please note I am creating virtual env (using python3 -m venv env)
>>>
>>> I see that there are 3 variables for PYTHON which we have to set:
>>> PYTHONPATH
>>> PYSPARK_DRIVER_PYTHON
>>> PYSPARK_PYTHON
>>>
>>> I have 2 doubts:
>>> 1. If i want to use Virtual env, do I need to point python path of
>>> virtual environment to all these variables?
>>> 2. Should I set these variables in spark-env.sh or should I set them
>>> using export statements.
>>>
>>> Regards
>>> Rajat
>>>
>>>
>>>

Mime
View raw message