spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24988) Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType
Date Wed, 01 Aug 2018 15:11:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565457#comment-16565457
] 

Apache Spark commented on SPARK-24988:
--------------------------------------

User 'mahmoudmahdi24' has created a pull request for this issue:
https://github.com/apache/spark/pull/21944

> Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes
of a StructType
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24988
>                 URL: https://issues.apache.org/jira/browse/SPARK-24988
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: mahmoud mehdi
>            Priority: Minor
>
> The main goal of this User Story is to extend the Dataframe methods in order to add a
method which casts all the values of a Dataframe, based on the DataTypes of a StructType.
> This feature can be useful when we have a large dataframe, and that we need to make
multiple casts. In that case, we won't have to cast each value independently, all we have
to do is to pass a StructType to the method castBySchema with the types we need (In real
world examples, this schema is generally provided by the client, which was my case).
> I'll explain the new feature via an example, let's create a dataframe of strings : 
> {code:java}
> val df = Seq(("test1", "0"), ("test2", "1")).toDF("name", "id")
> {code}
> Let's suppose that we want to cast the second column's values of the dataframe to integers,
all we have to do is the following : 
> {code:java}
> val schema = StructType( Seq( StructField("name", StringType, true), StructField("id",
IntegerType, true))){code}
> {code:java}
> df.castBySchema(schema)
> {code}
> I made sure that castBySchema works also with nested StructTypes by adding several tests.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message