spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 17:32:50 GMT
Just to clarify, are you referring to module dependencies in PySpark?


With Scala I can create a Uber jar file inclusive of all bits and pieces
built with maven or sbt that works in a cluster and submit to spark-submit
as a uber jar file.


what alternatives would you suggest for PySpark, a zip file?


cheers,


LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 8 Jan 2021 at 17:18, Sean Owen <srowen@gmail.com> wrote:

> THis isn't going to help submitting to a remote cluster though. You need
> to explicitly include dependencies in your submit.
>
> On Fri, Jan 8, 2021 at 11:15 AM Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> Hi Riccardo
>>
>> This is the env variables at runtime
>>
>> PYTHONUNBUFFERED=1;*PYTHONPATH=*
>> C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src
>>
>> This is the configuration set up for analyze_house_prices_GCP
>>
>> [image: image.png]
>>
>>
>>
>>
>> So like in Linux, I created a windows env variable and on PyCharm
>> terminal, I can see it
>>
>>
>>
>> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*echo
>> %PYTHONPATH%*
>>
>>
>> PYTHONPATH=C:\Users\admin\PycharmProjects\packages\;C:\Users\admin\PycharmProjects\pythonProject2\DS\;C:\Users\admin\PycharmProjects\pythonProject2\DS\conf\
>>
>>
>> ;C:\Users\admin\PycharmProjects\pythonProject2\DS\lib\;C:\Users\admin\PycharmProjects\pythonProject2\DS\src
>>
>> It picks up sparkstuff.py
>>
>>
>> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*where
>> sparkstuff.py*
>>
>> C:\Users\admin\PycharmProjects\packages\sparkutils\sparkstuff.py
>>
>> But in spark-submit within the code it does not
>>
>> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
>> --jars ..\spark-bigquery-with-dependencies_2.12-0.18.0.jar
>> analyze_house_prices_GCP
>> .py
>> Traceback (most recent call last):
>>   File
>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>> line 8, in <module>
>>     import sparkstuff as s
>> ModuleNotFoundError: No module named 'sparkutils'
>>
>> thanks
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 8 Jan 2021 at 16:38, Riccardo Ferrari <ferrarir@gmail.com> wrote:
>>
>>> I think spark checks the python path env variable. Need to provide that.
>>> Of course that works in local mode only
>>>
>>> On Fri, Jan 8, 2021, 5:28 PM Sean Owen <srowen@gmail.com> wrote:
>>>
>>>> I don't see anywhere that you provide 'sparkstuff'? how would the Spark
>>>> app have this code otherwise?
>>>>
>>>> On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Thanks Riccardo.
>>>>>
>>>>> I am well aware of the submission form
>>>>>
>>>>> However, my question relates to doing submission within PyCharm itself.
>>>>>
>>>>> This is what I do at Pycharm *terminal* to invoke the module python
>>>>>
>>>>> spark-submit --jars
>>>>> ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \
>>>>>  --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>>>>> analyze_house_prices_GCP.py
>>>>>
>>>>> However, at terminal run it does not pickup import dependencies in the
>>>>> code!
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File
>>>>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>>>>> line 8, in <module>
>>>>>     import sparkstuff as s
>>>>> ModuleNotFoundError: No module named 'sparkstuff'
>>>>>
>>>>> The python code is attached, pretty simple
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>

Mime
View raw message