spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: PyCharm, Running spark-submit calling jars and a package at run time
Date Fri, 08 Jan 2021 16:41:59 GMT
Hi Sean,


sparkstuff.py is under packages/sparutils/sparkstuff.py as shown below


[image: image.png]


So within PyCharm, it is picked up OK. However, at terminal level, it is
not picked up.


THis is a snapshot of Pycharm. The module I am trying to run is called
analyze_house_prices_GCP.py under src package. At the same level of src I
have the utility package called packages that has all  Spark related stuff.
These are in sparkstuff.py


from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext, HiveContext
#import findspark
#findspark.init()

def spark_session(appName):
  return SparkSession.builder \
        .appName(appName) \
        .enableHiveSupport() \
        .getOrCreate()

def sparkcontext():
  return SparkContext.getOrCreate()

def hivecontext():
  return HiveContext(sparkcontext())

def spark_session_local(appName):
    return SparkSession.builder \
        .master('local[1]') \
        .appName(appName) \
        .enableHiveSupport() \
        .getOrCreate()


Thanks


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 8 Jan 2021 at 16:27, Sean Owen <srowen@gmail.com> wrote:

> I don't see anywhere that you provide 'sparkstuff'? how would the Spark
> app have this code otherwise?
>
> On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> Thanks Riccardo.
>>
>> I am well aware of the submission form
>>
>> However, my question relates to doing submission within PyCharm itself.
>>
>> This is what I do at Pycharm *terminal* to invoke the module python
>>
>> spark-submit --jars
>> ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \
>>  --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>> analyze_house_prices_GCP.py
>>
>> However, at terminal run it does not pickup import dependencies in the
>> code!
>>
>> Traceback (most recent call last):
>>   File
>> "C:/Users/admin/PycharmProjects/pythonProject2/DS/src/analyze_house_prices_GCP.py",
>> line 8, in <module>
>>     import sparkstuff as s
>> ModuleNotFoundError: No module named 'sparkstuff'
>>
>> The python code is attached, pretty simple
>>
>> Thanks
>>
>>
>>
>>

Mime
View raw message