spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Commented] (SPARK-6292) Add RDD methods to DataFrame to preserve schema
Date Tue, 17 Mar 2015 15:56:38 GMT


Joseph K. Bradley commented on SPARK-6292:

If you compare the APIs of DataFrame and RDD, you can see there are methods implemented by
RDD which should be usable for DataFrames as well.  Right now, users can call myDataFrame.rdd.theRDDFunction(),
but that returns an RDD which must be re-converted into a DataFrame.  It would be nice to
have the methods in DataFrame for convenience.

> Add RDD methods to DataFrame to preserve schema
> -----------------------------------------------
>                 Key: SPARK-6292
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
> Users can use RDD methods on DataFrames, but they lose the schema and need to reapply
it.  For RDD methods which preserve the schema (such as randomSplit), DataFrame should provide
versions of those methods which automatically preserve the schema.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message