spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26199) Long expressions cause mutate to fail
Date Sat, 01 Dec 2018 16:04:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705879#comment-16705879
] 

Hyukjin Kwon commented on SPARK-26199:
--------------------------------------

I haven't taken a close look but sounds right. Wanna go ahead for a PR with a regression test?

> Long expressions cause mutate to fail
> -------------------------------------
>
>                 Key: SPARK-26199
>                 URL: https://issues.apache.org/jira/browse/SPARK-26199
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: João Rafael
>            Priority: Minor
>
> Calling {{mutate(df, field = expr)}} fails when expr is very long.
> Example:
> {code:R}
> df <- mutate(df, field = ifelse(
>     lit(TRUE),
>     lit("A"),
>     ifelse(
>         lit(T),
>         lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
>         lit("C")
>     )
> ))
> {code}
> Stack trace:
> {code:R}
> FATAL subscript out of bounds
>   at .handleSimpleError(function (obj) 
> {
>     level = sapply(class(obj), sw
>   at FUN(X[[i]], ...)
>   at lapply(seq_along(args), function(i) {
>     if (ns[[i]] != "") {
> at lapply(seq_along(args), function(i) {
>     if (ns[[i]] != "") {
> at mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T), lit("BBB
>   at #78: mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T
> {code}
> The root cause is in: [DataFrame.R#LL2182|https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L2182]
> When the expression is long {{deparse}} returns multiple lines, causing {{args}} to have
more elements than {{ns}}. The solution could be to set {{nlines = 1}} or to collapse the
lines together.
> A simple work around exists, by first placing the expression in a variable and using
it instead:
> {code:R}
> tmp <- ifelse(
>     lit(TRUE),
>     lit("A"),
>     ifelse(
>         lit(T),
>         lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
>         lit("C")
>     )
> )
> df <- mutate(df, field = tmp)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message