spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 17:15:38 GMT
Hi Riccardo

This is the env variables at runtime

PYTHONUNBUFFERED=1;*PYTHONPATH=*
C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src

This is the configuration set up for analyze_house_prices_GCP

[image: image.png]




So like in Linux, I created a windows env variable and on PyCharm terminal,
I can see it



(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*echo
%PYTHONPATH%*

PYTHONPATH=C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\

;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src

It picks up sparkstuff.py


(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*where
sparkstuff.py*

C:\Users\admin\PycharmProjects\packages\sparkutils\sparkstuff.py

But in spark-submit within the code it does not

(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
--jars ..\spark-bigquery-with-dependencies_2.12-0.18.0.jar
analyze_house_prices_GCP
.py
Traceback (most recent call last):
  File
"C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
line 8, in <module>
    import sparkstuff as s
ModuleNotFoundError: No module named 'sparkutils'

thanks


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 8 Jan 2021 at 16:38, Riccardo Ferrari <ferrarir@gmail.com> wrote:

> I think spark checks the python path env variable. Need to provide that.
> Of course that works in local mode only
>
> On Fri, Jan 8, 2021, 5:28 PM Sean Owen <srowen@gmail.com> wrote:
>
>> I don't see anywhere that you provide 'sparkstuff'? how would the Spark
>> app have this code otherwise?
>>
>> On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Riccardo.
>>>
>>> I am well aware of the submission form
>>>
>>> However, my question relates to doing submission within PyCharm itself.
>>>
>>> This is what I do at Pycharm *terminal* to invoke the module python
>>>
>>> spark-submit --jars
>>> ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \
>>>  --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>>> analyze_house_prices_GCP.py
>>>
>>> However, at terminal run it does not pickup import dependencies in the
>>> code!
>>>
>>> Traceback (most recent call last):
>>>   File
>>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>>> line 8, in <module>
>>>     import sparkstuff as s
>>> ModuleNotFoundError: No module named 'sparkstuff'
>>>
>>> The python code is attached, pretty simple
>>>
>>> Thanks
>>>
>>>
>>>
>>>

Mime
View raw message