spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Stojanov <>
Subject Spark Mongodb connector hangs indefinitely, not working on Amazon EMR
Date Wed, 22 Apr 2020 02:10:08 GMT
When running a Pyspark application on my local machine I am able to save
and retrieve from the Mongodb server using the Mongodb Spark connector. All
works properly. When submitting the exact same application on my Amazon EMR
cluster I can see that the package for the Spark driver is being properly
collected from Maven when the job is submitted. However, it is not working.

>From my instance of Amazon EMR I can communicate with the database using
Pymongo without problems. I can load/save dataframes when using pyspark
interactively from the driver, but when submitting jobs via spark-submit
over the yarn cluster it hangs.

The problem gives no error messages, it just shows 0 activity on the driver
and executor. The pyspark application just stops until manually terminated.

Has anyone else used the Mongodb Spark connector from Amazon EMR?


View raw message