spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 14:11:41 GMT
Hi,

I have a module in Pycharm which reads data stored in a Bigquery table and
does plotting.

At the command line on the terminal I need to add the jar file and the
packet to make it work.

(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
--jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
analyze_house_prices

_GCP.py

This works but the problem is that the imports into the module are not
picked up.  Example


import sparkstuff as s


This is picked up when run within Pycharm itself but not at the command
line!


(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
--jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
analyze_house_prices

_GCP.py

Traceback (most recent call last):

  File
"C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
line 8, in <module>

    import sparkstuff as s

ModuleNotFoundError: No module named 'sparkstuff'

The easiest option would be to run all these within PyCharm itself invoking
the jar file and package at runtime.

Otherwise I can run it at the command line but being able to resolve
imports. I appreciate any work-around this.

Thanks


LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Mime
View raw message