spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 16:13:09 GMT
Thanks Riccardo.

I am well aware of the submission form

However, my question relates to doing submission within PyCharm itself.

This is what I do at Pycharm *terminal* to invoke the module python

spark-submit --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
\
 --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
analyze_house_prices_GCP.py

However, at terminal run it does not pickup import dependencies in the code!

Traceback (most recent call last):
  File
"C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
line 8, in <module>
    import sparkstuff as s
ModuleNotFoundError: No module named 'sparkstuff'

The python code is attached, pretty simple

Thanks



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 8 Jan 2021 at 15:51, Riccardo Ferrari <ferrarir@gmail.com> wrote:

> You need to provide your python dependencies as well. See
> http://spark.apache.org/docs/latest/submitting-applications.html, look
> for --py-files
>
> HTH
>
> On Fri, Jan 8, 2021 at 3:13 PM Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a module in Pycharm which reads data stored in a Bigquery table
>> and does plotting.
>>
>> At the command line on the terminal I need to add the jar file and the
>> packet to make it work.
>>
>> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
>> --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
>> analyze_house_prices
>>
>> _GCP.py
>>
>> This works but the problem is that the imports into the module are not
>> picked up.  Example
>>
>>
>> import sparkstuff as s
>>
>>
>> This is picked up when run within Pycharm itself but not at the command
>> line!
>>
>>
>> (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
>> --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
>> analyze_house_prices
>>
>> _GCP.py
>>
>> Traceback (most recent call last):
>>
>>   File
>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>> line 8, in <module>
>>
>>     import sparkstuff as s
>>
>> ModuleNotFoundError: No module named 'sparkstuff'
>>
>> The easiest option would be to run all these within PyCharm itself
>> invoking the jar file and package at runtime.
>>
>> Otherwise I can run it at the command line but being able to resolve
>> imports. I appreciate any work-around this.
>>
>> Thanks
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Mime
View raw message