spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej <>
Subject [DISCUSS][R] Adding magrittr as a dependency for SparkR
Date Wed, 30 Sep 2020 19:11:35 GMT
Hi Everyone,

I'd like to start a discussion about possibility of adding magrittr
( as an explicit dependency for SparkR.
For those not familiar with the package, it provides a number small
utilities where the most important one is %>% function, similar to
pipe-forward (|>) in F# or thread-first macro (->) in Clojure. In other
words, it allows us to replace:

df <- createDataFrame(iris)

df_filtered <- filter(df, df$Sepal_Width > df$Petal_Length)

df_projected <- select(df_filtered, min(df$Sepal_Width - df$Petal_Length))


df_projected <- select(

  filter(createDataFrame(iris), column("Sepal_Width") >

  min(column("Sepal_Width") - column("Petal_Length"))



df_projected <- createDataFrame(iris) %>% 
  filter(.$Sepal_Width > .$Petal_Length) %>%
  select(min(.$Sepal_Width - .$Petal_Length))

It is widely used (see reverse dependency section, stable and
pretty much a core element of idiomatic R code these days.

Why we might want to add it:

  * Improve readability of SparkR examples which, subjectively speaking,
    can look a bit archaic.
  * Reduce verbosity of SparkR codebase.

Possible risks:

  * It is additional dependency for CI pipeline.

    A: magrittr is already a transitive dependency for SparkR tests (it
    is required by testthat), its API is extremely stable and itself
    requires no dependencies.
  * It is an additional dependency for SparkR installations.

    A: Give widespread usage (over 1200 reverse imports, including some
    of the most popular packages) it is probably of any, but minimal, R

    While it's just anecdotal evidence, most of the SparkR applications
    I've seen out there, already use magrittr.


  * Supporting non-standard evaluation.

Thanks in advance for your input.

Best regards,
Maciej Szymkiewicz


View raw message