spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oscar D. Lara Yejas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column
Date Thu, 06 Oct 2016 00:20:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550390#comment-15550390
] 

Oscar D. Lara Yejas commented on SPARK-17774:
---------------------------------------------

To implement method head() only I'll be happy to:

1) Remove lines 63-69 (method collect) in PR 11336
2) Throw an error if a column can't be collected as opposed to returning an empty column (though
I'm okay with either option)

Once again, all my code IS STILL NEEDED for head() to (1) having Column class to have a reference
to the parent DatFrame and (2) propagating the parent DataFrame through every possible Column
operation.

Bottom line: we should mark this JIRA as a duplicate and merge PR 11336 with the minor changes
above. Let me know if I have your blessing so I can proceed with this. It should be very quick
for me. Thanks!
cc: [~falaki] [~shivaram]

> Add support for head on DataFrame Column
> ----------------------------------------
>
>                 Key: SPARK-17774
>                 URL: https://issues.apache.org/jira/browse/SPARK-17774
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> There was a lot of discussion on SPARK-9325. To summarize the conversation on that ticket
regardign {{collect}}
> * Pro: Ease of use and maximum compatibility with existing R API
> * Con: We do not want to increase maintenance cost by opening arbitrary API. With Spark's
DataFrame API {{collect}} does not work on {{Column}} and there is no need for it to work
in R.
> This ticket is strictly about {{head}}. I propose supporting {{head}} on {{Column}} because:
> 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they do that
on SparkDataFrame they get an error. Not a good experience
> 2. Adding support for it does not require any change to the backend. It can be trivially
done in R code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message