[ https://issues.apache.org/jira/browse/SPARK-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296573#comment-15296573
]
Mikołaj Hnatiuk commented on SPARK-15294:
-----------------------------------------
Hi, ok, so I have this function defined:
{code}
# IN "generics.R"
# @rdname pivot
# @export
setGeneric("pivot", function(x, colname, values=NULL) { standardGeneric("pivot") })
# IN "group.R":
setMethod("pivot",
signature(x = "GroupedData"),
function(x, colname, values=NULL){
if(is.null(values)){
result <- SparkR:::callJMethod(x@sgd, "pivot", colname)
}else{
stopifnot(length(values)==length(unique(values)))
result <- SparkR:::callJMethod(x@sgd, "pivot", colname, values)
}
SparkR:::groupedData(result)
})
{code}
And now, Im trying to do this
{code}
df = createDataFrame(sqlContext, data.frame(
earnings = c(10000, 10000, 11000, 15000, 12000, 20000, 21000, 22000),
course = c("R", "Python", "R", "Python", "R", "Python", "R", "Python"),
year = c(2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016)
))
sums <- groupBy(df, "year") %>%
pivot("course", values) %>%
SparkR::summarize(sumOfEarnings = sum(df$earnings) ) %>%
collect()
{code}
It apparently works, but look at the last transformation (summarize) -> I have to use sum(df$earnings)
instead of just giving a column name *I shouldn't be summing variable "earnings" from DataFrame*.
Instead _I should_ be able to sum "earnings" from GroupedData object that function "pivot"
returns, right?
Anyway, please give it a try :) I know this is the smallest commit every, but I'd be delighted
if you would open PR for this.
> Add pivot functionality to SparkR
> ---------------------------------
>
> Key: SPARK-15294
> URL: https://issues.apache.org/jira/browse/SPARK-15294
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Reporter: Mikołaj Hnatiuk
> Priority: Minor
> Labels: pivot
>
> R users are very used to transforming data using functions such as dcast (pkg:reshape2).
https://github.com/apache/spark/pull/7841 introduces such functionality to Scala and Python
APIs. I'd like to suggest adding this functionality into SparkR API to pivot DataFrames.
> I'd love to to this, however, my knowledge of Scala is still limited, but with a proper
guidance I can give it a try.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|