spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mahmoud mehdi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-24988) Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType
Date Wed, 01 Aug 2018 13:56:00 GMT
mahmoud mehdi created SPARK-24988:
-------------------------------------

             Summary: Add a castBySchema method which casts all the values of a DataFrame
based on the DataTypes of a StructType
                 Key: SPARK-24988
                 URL: https://issues.apache.org/jira/browse/SPARK-24988
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: mahmoud mehdi


The main goal of this User Story is to extend the Dataframe methods in order to add a method
which casts all the values of a Dataframe, based on the DataTypes of a StructType.

This feature can be useful when we have a large dataframe, and that we need to make multiple
casts. In that case, we won't have to cast each value independently, all we have to do is
to pass a StructType to the method castBySchema with the types we need (In real world examples,
this schema is generally provided by the client, which was my case).

I'll explain the new feature via an example, let's create a dataframe of strings : 
{code:java}
val df = Seq(("test1", "0"), ("test2", "1")).toDF("name", "id")
{code}
Let's suppose that we want to cast the second column's values of the dataframe to integers,
all we have to do is the following : 
{code:java}
val schema = StructType( Seq( StructField("name", StringType, true), StructField("id", IntegerType,
true))){code}
{code:java}
df.castBySchema(schema)
{code}

I made sure that castBySchema works also with nested StructTypes by adding several tests.



 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message