spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Is operation subtracting two dataframe valid.
Date Fri, 06 Aug 2021 22:08:38 GMT
I am using Google Kubernetes Cluster with the docker image that I built
with PySpark 3.1.1 on prem and pushed the docker image to a  google

The py module generates some 100 rows of Random data and then writes it to
a BigQuery table.

Both write to and subsequent read  from BigQuery table show the correct
number of rows:

 Populated BigQuery table test.randomData

 rows written is  100

 Reading from BigQuery table test.randomData

 rows read in is  100

However, the following operation fails

       if df2.subtract(read_df).count() == 0:

            print("Data has been loaded OK to Oracle table")


            print("Data could not be loaded to Oracle table, quitting")


21/08/06 21:58:45 WARN org.apache.spark.scheduler.TaskSetManager: Lost task
0.0 in stage 8.0 (TID 11) ( executor 1):
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available

Further down it shows:

py4j.protocol.Py4JJavaError: An error occurred while calling o116.count.

: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,

OK this may be specific to BigQuery because as I rtecall this operation
could be done against an Oracle table.


   view my Linkedin profile

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

View raw message