spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-10903) Make sqlContext global
Date Tue, 20 Oct 2015 00:46:28 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964326#comment-14964326
] 

Felix Cheung edited comment on SPARK-10903 at 10/20/15 12:46 AM:
-----------------------------------------------------------------

Looked into and tested with a few approaches.

Tried to minimize changes by sticking to S3 method dispatch, however, it is not working

{code}
createDataFrame <- function(sqlContext, data, schema = NULL, samplingRatio = 1.0) UseMethod("createDataFrame")
createDataFrame.jobj <- function(sqlContext, data, schema = NULL, samplingRatio = 1.0)
{
  createDataFrame(data, schema, samplingRatio)
}

createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0) {
  sqlContext <- getSqlContext()
...

# works
> a <- createDataFrame(iris)

# does not work
> b <- createDataFrame(sqlContext, iris)
Error in createDataFrame(sqlContext, iris) : unexpected type: jobj
{code}

Any idea? IMO we would have otherwise two options:
1. Promote methods to S4 - though some functions support RDD which we might not want to expose.
2. Make breaking changes to method argument, ie. have only one signature `createDataFrame
<- function(data, schema = NULL, samplingRatio = 1.0)`



was (Author: felixcheung):
Looked into and tested with a few approaches.

Tried to minimize changes by sticking to S3 method dispatch, however, it is not working

{code}
createDataFrame <- function(sqlContext, data, schema = NULL, samplingRatio = 1.0) UseMethod("createDataFrame")
createDataFrame.jobj <- function(sqlContext, data, schema = NULL, samplingRatio = 1.0)
{
  createDataFrame(data, schema, samplingRatio)
}

# TODO(davies): support sampling and infer type from NA
createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0) {
  sqlContext <- getSqlContext()
...

# works
> a <- createDataFrame(iris)

# does not work
> b <- createDataFrame(sqlContext, iris)
Error in createDataFrame(sqlContext, iris) : unexpected type: jobj
{code}

Any idea? IMO we would have otherwise two options:
1. Promote methods to S4 - though some functions support RDD which we might not want to expose.
2. Make breaking changes to method argument, ie. have only one signature `createDataFrame
<- function(data, schema = NULL, samplingRatio = 1.0)`


> Make sqlContext global 
> -----------------------
>
>                 Key: SPARK-10903
>                 URL: https://issues.apache.org/jira/browse/SPARK-10903
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Narine Kokhlikyan
>            Priority: Minor
>
> Make sqlContext global so that we don't have to always specify it.
> e.g. createDataFrame(iris) instead of createDataFrame(sqlContext, iris)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message