spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 17:18:13 GMT
THis isn't going to help submitting to a remote cluster though. You need to
explicitly include dependencies in your submit.

On Fri, Jan 8, 2021 at 11:15 AM Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi Riccardo
>
> This is the env variables at runtime
>
> PYTHONUNBUFFERED=1;*PYTHONPATH=*
> C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src
>
> This is the configuration set up for analyze_house_prices_GCP
>
> [image: image.png]
>
>
>
>
> So like in Linux, I created a windows env variable and on PyCharm
> terminal, I can see it
>
>
>
> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*echo
> %PYTHONPATH%*
>
>
> PYTHONPATH=C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\
>
>
> ;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src
>
> It picks up sparkstuff.py
>
>
> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*where
> sparkstuff.py*
>
> C:\Users\admin\PycharmProjects\packages\sparkutils\sparkstuff.py
>
> But in spark-submit within the code it does not
>
> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
> --jars ..\spark-bigquery-with-dependencies_2.12-0.18.0.jar
> analyze_house_prices_GCP
> .py
> Traceback (most recent call last):
>   File
> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
> line 8, in <module>
>     import sparkstuff as s
> ModuleNotFoundError: No module named 'sparkutils'
>
> thanks
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 8 Jan 2021 at 16:38, Riccardo Ferrari <ferrarir@gmail.com> wrote:
>
>> I think spark checks the python path env variable. Need to provide that.
>> Of course that works in local mode only
>>
>> On Fri, Jan 8, 2021, 5:28 PM Sean Owen <srowen@gmail.com> wrote:
>>
>>> I don't see anywhere that you provide 'sparkstuff'? how would the Spark
>>> app have this code otherwise?
>>>
>>> On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Thanks Riccardo.
>>>>
>>>> I am well aware of the submission form
>>>>
>>>> However, my question relates to doing submission within PyCharm itself.
>>>>
>>>> This is what I do at Pycharm *terminal* to invoke the module python
>>>>
>>>> spark-submit --jars
>>>> ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \
>>>>  --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>>>> analyze_house_prices_GCP.py
>>>>
>>>> However, at terminal run it does not pickup import dependencies in the
>>>> code!
>>>>
>>>> Traceback (most recent call last):
>>>>   File
>>>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>>>> line 8, in <module>
>>>>     import sparkstuff as s
>>>> ModuleNotFoundError: No module named 'sparkstuff'
>>>>
>>>> The python code is attached, pretty simple
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>

Mime
View raw message