spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <>
Subject Accessing Scala RDD from pyspark
Date Fri, 16 Mar 2018 05:17:16 GMT
Hi there.

I am calling custom Scala code from pyspark (interpreter). The customer
Scala code is simple: it just reads a textFile using sparkContext.textFile
and returns RDD[String].

In pyspark, I am using sc._jvm to make the call to the Scala code:

*s_rdd = sc._jvm.package_name.class_name.method().*

It returns a py4j.JavaObject. Now I want to use this in pyspark and doing
the following wrapping:
*py_rdd = RDD(s_dd, sparkSession)*

No error yet. But when I make a call to any RDD methods using py_rdd (e.g.
py_rdd.count()), I get the following error:
py4j.protocol.Py4JError: An error occurred while calling o50.rdd. Trace:
py4j.Py4JException: Method rdd([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(

Why is that? What I am doing wrong?

Scala version 2.11.8
(OpenJDK 64-Bit Server VM, Java 1.8.0_121)
Spark 2.0.2
Hadoop 2.7.3-amzn-0

Thanks & Regards,

View raw message