spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Issue while calling foreach in Pyspark
Date Fri, 07 May 2021 16:03:22 GMT
Hi,

I am not convinced foreach works even in 3.1.1
Try doing the same with foreachBatch

                     foreachBatch(sendToSink). \
                    trigger(processingTime='2 seconds'). \

and see it works

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 7 May 2021 at 16:07, rajat kumar <kumar.rajat20del@gmail.com> wrote:

> Hi Team,
>
> I am using Spark 2.4.4 with Python
>
> While using below line:
>
> dataframe.foreach(lambda record : process_logs(record))
>
>
> My use case is , process logs will download the file from cloud storage
> using Python code and then it will save the processed data.
>
> I am getting the following error
>
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line
> 46, in launch_gateway
>     return _launch_gateway(conf)
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line
> 108, in _launch_gateway
>     raise Exception("Java gateway process exited before sending its port
> number")
> Exception: Java gateway process exited before sending its port number
>
> Can anyone pls suggest what can be done?
>
> Thanks
> Rajat
>

Mime
View raw message