spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Szymkiewicz <mszymkiew...@gmail.com>
Subject Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet
Date Wed, 18 Mar 2020 17:35:38 GMT
Hi Ben,

Please note that `_sc` is not a SQLContext. It is a SparkContext, which
is used primarily for internal calls.

SQLContext is exposed through `sql_ctx`
(https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80)

On 3/17/20 5:53 PM, Ben Roling wrote:
> I tried this on the users mailing list but didn't get traction.  It's
> probably more appropriate here anyway.
>
> I've noticed that DataSet.sqlContext is public in Scala but the
> equivalent (DataFrame._sc) in PySpark is named as if it should be
> treated as private.
>
> Is this intentional?  If so, what's the rationale?  If not, then it
> feels like a bug and DataFrame should have some form of public access
> back to the context/session.  I'm happy to log the bug but thought I
> would ask here first.  Thanks!

-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: C095AA7F33E6123A



Mime
View raw message